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SOME INTRODUCTORY COMMENTS 


A. S. Barr 
University of Wisconsin 


The research herein reported is the product 
of the efforts of many persons working to- 
gether cooperatively. The project was initi- 
ated during the academic year of 1934-35. 
Throughout this year a group of some twenty 
= representing the State Department of 

blic Instruction; the Department of Edu- 
cation, University of Wisconsin; and a num- 
ber of local school systems, met every other 
Saturday morning to discuss possible ap- 
proaches to a systematic study of the problem 
of teacher evaluation. The result of this year’s 
work was a statement of the problem and a 
tentative formulation of the procedure to be 
followed in this study. During the academic 
year of 1935-36, this tentative formulation 
was turned over to a graduate seminar in 
Education for their critical analysis and re- 
vision: during the first semester of this year 
the procedure was rewritten in light of ideas 
gained from a careful survey of previous in- 
vestigations; during the second semester the 
procedure was tried out in a preliminary in- 
vestigation and again rewritten. The proce- 
dure as then formulated provided the general 
pattern for the series of investigations here 
reported. The data for these investigations 
were collected for the most part during the 
academic years of 1936-38. The statistical 
analysis of the data consumed the greater 
part of the period intervening from 1938 and 
1944. The report here presented is a collection 
of the several studies arising from the plan 
thus projected.’ 

2The authors are grateful for much help received from 
many different persons. First of all, the authors wish to 
acknowledge the many hours of valuable time . by the 
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THE PROBLEM 


To select, guide, and educate teachers as 
effectively as we should we must know much 
more than we do now about the prerequisites 
to teaching efficiency and how to identify and 
describe these prerequisites accurately. The 
purpose of this investigation, or series of in- 
vestigations, is to study these prerequisites, 
their inter-relatedness, and the validity of the 
instruments commonly employed in collecting 
data about them. More specifically an attempt 
is made to answer the following questions: 
(1) What are the prerequisites to teaching 
efficiency, particularly for teachers of the 
social studies in the 7th and 8th grades of 
Wisconsin rural schools? (2) How valid and 
reliable are certain of the instruments com- 
monly employed in measuring teacher effici- 
ency and its prerequisites? And (3) how do 
the prerequisites to teaching efficiency, as 
measured in this investigation, seem to be 
interrelated? Besides these more specific pur- 
poses, it is hoped that this investigation may 
throw some light, too, upon the general nature 
and organization of human abilities. 

There are many important problems in the 
field of teacher evaluation not here studied. 
Those enumerated here seem most appropriate 
to the conditions under which the investiga- 
tion was conducted. 


THE IMPORTANCE OF THE PROBLEM 


There are approximately one and a quarter 
million teachers in this country who teach 
some thirty million pupils. The schools in 
which these pupils and teachers work con- 
stitute one of the country’s most extensive 
enterprises and in a very real sense supply the 


Dean of the School of Education, have given valuable edi- 
torial help. The full list of those who have assisted is too 
long to reproduce here. The authors wish to acknowledge too 
So oudy sone financial aid given by the Works 
Progress Ad —s and the Graduate Research Committee 
of the ae Wisconsin, both of whom made sizeable 
contributions to support of this . The assistance 
of all of these is most gratefully acknow! 
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foundations for the democratic order for 
which we all strive. Next to the pupil, the 
teacher is the most important single factor in 
this great enterprise. She is the central im- 
pelling force in our educational effort. With- 
out good teachers there cannot be good 
schools. 

To get good teachers there must be wise 
selection and guidance, good preparation, and 
sound employment and placement practices. 
The education of teachers should be predi- 
cated upon discriminating selection. But to 
guide and select wisely, one must have accu- 
rate knowledge of the prerequisites to teach- 
ing efficiency and possess the means of identi- 
fying these prerequisites in a trustworthy 
fashion. The effective education of teachers, 
both before and after they enter service, de- 
pends in a very real sense upon our ability 
to identify progress in attaining teaching effi- 
+ ciency and its prerequisites. Inefficiency in 
evaluation leads to inefficiency in teacher edu- 
cation. Only by knowing the results of our 
efforts to educate teachers may we improve 
the process. Not only do those responsible for 
teacher selection, guidance, and education 
need more precise information relative to 
teaching efficiency, but administrators and 
placement officials, too, need this information 
for effective employment assignment and pro- 
motion practices. The basis upon which these 
responsibilities are now discharged by many 
officials are scarcely worthy of the profession. 
The fair treatment of teachers and pupils are 
likewise involved in this problem as is the 
quality of the service rendered by schools. 
There is already available considerable evi- 
dence, subjective and objective, to indicate 
that current methods of evaluating teaching 
efficiency are inadequate. The teacher, the 
professional educator, the administrator, the 
pupils and the public would all profit by 
better measures of teaching efficiency. 


Tue GENERAL PLAN OF THE INVESTIGATION 


It appeared from past experience with 
similar studies that the purposes of this in- 
vestigation might be best served by the use 
of a semi-controlled statistical technique of 
research applied under normal classroom con- 
ditions. The application of this technique as 
here used involved the following steps: (1) 
the definition of the task to be performed; 
i.e., the changes to be sought in the pupils; 
(2) the measurement of pupil growth by the 
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application of certain instruments of measure- 
ment before and after teaching; (3) the sys- 
tematic measurement and control of factors 
which seem to condition pupil growth; (4) 
the definition and measurement of certain 
teaching factors chosen for study, and (5) the 
systematic study of the relationships between 
these factors and teaching efficiency as herein 
defined. 

One of the important phases of this study 
is its attempt to define teaching efficiency in 
terms of its effects. This idea is not new, not 
even in the field of professional education,’ 
but further effort in this direction seemed 
desirable. Much confusion has arisen in the 
field of human evaluation through failure 
to establish a definite point of departure. 
Notable success has been achieved in certain 
of the physical sciences by defining very elu- 
sive phenomena through their effects. The 
same thought has here been applied. 


THE TEACHERS AND PuPILs STUDIED 


The investigations here reported were car- 
ried out in the main with seventh- and eighth- 
grade teachers of citizenship in non-depart- 
mentalized one- and two-room rural schools 
in the state of Wisconsin. Several of the 
studies employed supplementary data from 
other sources. The main body of data for the 
investigation, however, consists of measures 
of three groups of teachers and pupils: (1) a 
group of 24 teachers teaching in state graded 
schools with 342 pupils; (2) a group of 47 
teachers teaching in one-room rural schools 
with 338 pupils; (3) a group of 31 teachers 
with 181 pupils in one- and two-room schools. 
One group of 24 teachers with 194 pupils 
was investigated in a follow-up study extend- 
ing through a period of two years. In several 
instances data were secured relative to other 
teachers and pupils to supplement those 
secured with respect to the main group of 
teachers here studied. Descriptions of these 
and other teachers and pupils will be found 
in the reports to follow. 


THE CRITERION OF TEACHING EFFICIENCY 
The principal criterion of teaching efficiency 


¢ employed in this investigation was a com- 


posite of a number of measures of pupil 
growth and achievement. In certain of the 
studies, composites of the scores on teacher 


1 William H. Lancelot, A. S. Barr, Gilbert L. Betts and 
“The Measurement of Teaching Efficiency” (New 
York: The MacMillan Co., 1935). 
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rating-scales and composites of measures of 
certain qualities commonly associated with 
teaching efficiency constituted other criteria 
of efficiency. Considerable care was taken 
in the development of the criterion. For the 
primary criterion both unit and overall year- 
apart tests were employed. The overall tests 
were given at the beginning and at the end 
of the school year, approximately six months 
apart. The overall tests were chosen to relate 
to the more general purposes of the course in 
citizenship taught by the teachers here under 
investigation. The unit tests were given at the 
beginning and end of two standard tasks, each 
three weeks in length, chosen from the regular 
course in citizenship and defined in terms of 
the accepted objectives of the course. Each 
task was presumably of equal difficulty and 
applied to the instruction of groups of pupils 
of equal capacity, under comparable condi- 
tions. Finally, certain measures employed for 
equating purposes were administered to the 
pupils. The measures used in the equating of 
pupil-groups were those of intelligence, read- 
ing ability, and other factors thought to be 
related to pupil growth and achievement. 


In determining the contribution of each 
teacher to the total learning-teaching situa- 
tion, the growth or achievement of the pupils 
under her direction was determined by sub- 
tracting the initial test scores of her pupils 
from their final test scores. In this fashion a 
pupil change-score was secured for each class. 
The part of each gain-score attributable to 
the effort of the teacher was considered to 
be the difference between the gain-scores 
secured from the application of the tests to 
the pupils at the beginning and end of the 
experimental period, and the score predicted 
for each pupil from measures of the pupils’ 
intelligence and other factors thought to influ- 
ence achievement. The residual pupil gain was 
attributed to the teacher and other uncon- 
trolled factors. The uncontrolled factors were, 
for the purpose of this study, assumed to be 
randomly distributed. The statistical proce- 
dures employed in making these calculations 
and in the development of the several criterion 
of teaching efficiency will be described in the 
special reports to follow. 

It is probably unnecessary to say that the 
investigators considered the several criteria of 
teaching efficiency employed in this investiga- 
tion a very important matter and gave it very 
careful consideration. The worth of all that 


- 
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follows depend in a very real measure upon 
the adequacy of the criteria of teaching effi- 
ciency employed. While no attempt has been 
made to measure all of the worthwhile out- 
comes of education, it is hoped that those 
chosen for study in this investigation will illus- 
trate some of the more important problems 
arising from attempts at the systematic evalu- 
ation of teaching efficiency. 


DEFINING THE TASK 


The tasks to be performed by the teachers 
and pupils participating in this investigation 
were defined in terms of unit and course 
objectives. 

To provide a satisfactory point of departure 
for the investigation an attempt was made to 
explore the more general purposes of citizen- 
ship instruction. This exploratory study in- 
volved analyses of expert opinion as found in 
current statements* of the purposes of citizen- 
ship education and as revealed in two special 
investigations® in this area. The main source 
of opinion was found in articles on the teach- 
ing of the social studies, each of which was 
carefully studied for expert statements of the 
purposes of education. The objectives for both 
unit and course assignments were carefully 
defined and made known to the teachers par- 
ticipating in this investigation. 

Considerable time was taken in the word- 
ing of objectives. There appear to be two 
quite different forms in which the purposes 
of education have been stated: (1) state- 
ments in terms of pupil behavior, and (2) 
statements in terms of controls over behavior. 
These latter seem to be of two sorts: (1) 
traits, qualities, and characteristics of indi- 
viduals such as honesty, open-mindedness, 
consideratness, etc., and (2) the mental pre- 
requisites to successful performance (be- 
havior), such as knowledges, skills, attitudes, 
ideas, interests, and appreciations. Each of 
these forms has its own peculiar advantages 
and disadvantages, educational and psycho- 
logical. A composite approach was employed, 
as an examination of the statement of objec- 
tives of the work in citizenship will indicate. 


2 Charles A. Beard, The Nature of the Social Sciences, Part 

ya, Doon the Commissidn on the Social Studies of the 
Historioal Association (New York: Charles Scrib- 
ner’s yo 1934). 

* Edith Herrin Cocke, A Study of the Objective in Teach- 
ime —- Government _ Thesis (Madison, Wis.: 
Seay } gy 1936 

Barton J. Rogers. The Objectoes of Community Civics, 
— (Madison, Wis.: University of Wisconsin, 
1936). 
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TEACHER AND PuPIL MEASURES 


Various measures were applied to both 
teachers and pupils. In choosing pupil meas- 
ures an attempt was made to select those that 
measured not only knowledge but attitudes, 
skills, and behavior as well. While the number 
of measures varied from study to study, some 
twenty-three measures, mostly tests, were 
applied to the pupils altogether. To measure 
some of the less tangible outcomes of citizen- 
ship training, the tests available in this area, 
such as, for example, the Wrightstone tests, 
were supplemented by especially constructed 
questionnaires and rating-scales. The original 
plan included behavior records for each pupil, 
but these.somehow became lost in the process. 
As one looks back over the several measures 
employed and the data collected, one can only 
regret that they were not more adequate. 

“Some twenty-five different measures were 
applied to the teachers, including measures of 
the teachers’ knowledge of the subject matter 
of the courses taught, intelligence, socio- 
economic status, skill in expression, personal 
fitness, social adjustment, emotional stability, 
teacher-pupil relationships, leadership, and 
interest in teaching. While a very large num- 
ber of measures were applied to the teacher, 
the list of qualities measured was by no means 
complete, and in many areas the measures 
were quite inadequate. It seemed not feasible, 
for example, to measure the teachers’ health, 
energy, and drive. No data seem to have been 
collected, either, with reference to the general 
cultural attainment of the teachers included 
in the study. There are, undoubtedly, many 
other important aspects of teaching ability 
not here measured. 


Other information collected, with reference 
to certain of the teachers, included two sound 
records of lessons taught by each of twenty- 
five teachers; these records constituted a 
major source of data for a detailed activity- 
analysis of teaching. Samples of the work of 
pupils and the teachers’ teaching outlines were 
also collected. All teachers were given, too, the 
unit pretests given the pupils, and each 
teacher filled out a detailed information blank. 
For a listing and a description of the measures 
employed in the several investigations here 
reported, the reader is referred to the indi- 
vidual reports to follow. 
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THE COLLECTION oF DaTA 


The data were collected for the most part 
by advanced graduate students pursuing work 
for the doctor’s degree in education at the 
University of Wisconsin. All of the teachers 
were visited one or more times, most of them 
many times. In the first investigation the 
tests taken by the pupils were administered 
by the teacher; in the later investigations 
the pupil tests were administered by trained 
investigators. With a few exceptions the 
tests were administered to the teachers, 
individually or in small groups, by trained 
investigators. Care was taken to make the 
data as accurate as possible. More detailed 
information relative to the methods of collect- 
ing the data will be given in the special reports 
to follow. 


Tue ContTroL or Factors AFFECTING THE 
OvuTCOMES OF INVESTIGATION 


While this investigation was of the field study 
sort, it seemed important to the investigators 
that the factors conditioning pupil growth and 
achievement be controlled as carefully as pos- 
sible. To do this the following precautions 
were taken: 


1. The subjects employed in the investiga- 
tion were chosen from definite types of schools 
and homes, namely, 7th- and 8th-grade pupils 
enrolled in one-room rural and village state- 
graded schools, all engaged in the study of 
citizenship. Each sample was tested for 
homogeneity, and carefully described. 


2. The teachers chosen for investigation 
were all selected from non-departmentalized 
schools. The investigators could discover no 
easy way of discounting the effects of other 
teachers upon pupils in departmentalized 
schools. 

3. An attempt was made to equalize the 
amount of time devoted to the study of citi- 
zenship. Each day’s work was to consist of 
forty minutes in one class period or two 
periods of twenty minutes each. There was to 
be no home work. Where there were field trips 
and other activities these were to be employed 
in such a manner as not to increase the total 
time given to the work in this field. 

4. Realizing that the equipment with which 
teachers would work might vary considerably, 
the investigators attempted to supply each 
teacher, upon request, with supplementary 
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reading materials, equipment, and visual aids. 
A list of these materials was supplied each 
teacher. 

5. An attempt was made to equate pupil 
ability through the use of pre-tests in the sub- 
ject matter areas, tests of intelligence, 
measures of socio-economic status, and read- 
ing ability. 


THE STATISTICAL TREATMENT OF THE DATA 


A variety of statistical procedures have 
been employed in the treatment of the data 
collected. The study is, however, principally 
a correlation study supplemented and enriched 
by many other techniques such as case studies, 
regression techniques, and factor analysis. In 
general, an attempt has been made to employ 
up-to-date procedures appropriate to the task 
at hand. The detailed procedures employed in 
this investigation will be described in the 
special reports to follow. 


SOME INTRODUCTORY COMMENTS 


SPECIAL INVESTIGATIONS INCLUDED IN 
Tuis REPORT 


Within the general framework here de- 
scribed seven special investigations were made 
as follows: 
1. The Measurement of Teaching Ability 
(first investigation). L. E. Rostker. 

2. The Measurement of Teaching Ability 
(second investigation). J. E. Rolfe. 

3. The Measurement of Teaching Ability 
(third investigation). C. V. LaDuke. 

. A Study of the Relationship Between 
Teaching Procedures and Educational 
Outcomes. C. D. Jayne. 

. The Improvement of Teaching Efficiency 
Through Supervision. C. R. Von Eschen. 

. Personality and Teaching Efficiency. 
R. E. Gotham. 

. A Factor Analysis of Teacher Abilities. 
A. G. Hellfritzsch. 





THE MEASUREMENT OF TEACHING ABILITY 
STUDY NUMBER ONE 


L. E, RostTKER 
New York City 


STATEMENT OF THE PROBLEM 


The central problem of this study was to 
determine the relationship between selected 
teacher ‘measures as applied to 7th- and 8th- 
grade teachers of the social studies in non- 
departmentalized rural and village schools, 
and the changes produced by these teachers 
in their pupils. Differently stated, the purpose 
of this study was to determine the validity of 
selected teacher-measuring devices when vali- 
dated against the criterion of pupil changes. 
Another problem of this study was to deter- 
mine what combination of teacher measures 
give the highest correlation with teaching 
ability as measured by the criterion of pupil 
changes. 


SECTION I 


THE CRITERION OF TEACHING 
ABILITY 


The criterion of teaching ability accepted 
for this study was the measurable changes 
produced in pupils by their teachers. 

There has been considerable discussion as 
to the advisability of using measurable pupil 


changes as a criterion of teaching ability. 
Pittinger’ pointed out that pupil achievement 
is not the result of any single teacher’s effort 
but rather the resultant efforts of a number 
of teachers. Symonds? was of the opinion that 
since pupils came to their teachers with vary- 
ing degrees of intelligence and amounts of 
preparation, classroom achievement could be 
no valid measure of teaching ability until 
better professional tests were available. Fritz* 
concluded that the use of standardized tests 
measured chiefly the memorization of factual 
materials. Shannon* objected to the use of 


2B. F. Pittinger, “Problems of Teacher Measurement”, 
_— of Educational Psychology, VIII (1917), pp. 103— 


2 Percival M. Symonds, | 
po & in Hi 


“The Measurement of Teaching 
School’, Educational Administration and 
Ay? pp. 217-231. 

“The Prediction of Probable Teachin 
Bawcettanel Administration and Supervision, Xx 


C1580) bn, = ag “Difficulties in Estimating the Efficiency 
ti 

of Teachers’, The National Elementary Principal, <1 

of the Department of Elementary School Principals, 

No. 6 (July, 1937), pp. 524—529. 


pupil changes as the criterion of teaching 
ability because the tests available did not 
measure all pupil aspects. . 

The use of measurable pupil changes as the 
criterion of teaching ability has, on the other 
hand, found many champions. Courtis*® stated 
his position by saying: “The only position I 
am willing to accept must be in terms of 
changes in the pupils taught.” Corey*® sug- 
gested that if a large number of teachers were 
employed and experiments were carefully con- 
trolled, measurable pupil changes could be 
used as the criterion of teaching ability. 
Trow’ stated that since “the task of the 
teacher is . . . to assist the pupils to learn 

. . it is on the basis of the extent to which 
pupils do learn that teaching should be 
judged.” 

For this study the assumption was made 
that a good teacher is one who produced 
desirable measurable changes in her pupils. 
Since a teacher is engaged to teach and to 
modify the behavior of her pupils, the degree 
to which changes are produced in her pupils 
is a reflection of the ability of the teacher. 

In accepting measurable pupil changes as 
the criterion of teaching ability, a nymber of 
assumptions were made. First, that there are 
certain factors associated with pupil perform- 
ance, such as intelligence, reading ability, and 
socio-economic status, which affect pupil per- 
formance, and which factors vary in degree 
from pupil to pupil. It is necessary to elimi- 
nate the varying effects of these factors so 
that whatever pupil changes occur can be 
attributed to the teacher rather than to the 
influence of pupil factors. Secondly, it would 
be difficult to assume identical curricula in 
the number of different classes used in this 
study. Nor is the identity of curricula desir- 
able if teaching is to provide for individual 
differences. The teachers participating in this 


aut A. Comsh, “The Measurement of the Efficiency of 
cational Administration and Supervision, 
xvi Ades) 5 pp. 401-412. 
* Stephen M. Corey, “The Present State of Ignorance About 
Factors Affecting a 1ti902): yw Administra- 
ow = 


i Be Eval 
ministration and Supervision, XX ese a 
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study were, therefore, told that they would 
be expected to teach two three-week units of 
work, one in the fall of the year, the other in 
the following spring. For these units, the 
teachers were given the desired general objec- 
tives and broad topical outlines and told that 
they could teach whatever subject matter they 
chose, providing that the materials chosen fell 
within the limits of these units. The assump- 
tion was made that a good teacher would 
wisely choose her materials of instruction and 
that the changes made by her pupils would 
tend to indicate whether her choice had been 
wise. The same position was taken with refer- 
ence to methods of instruction. These must 
also be chosen with regard to the varying 
capacities, aims, needs, etc., of the pupils in 
each class. There can be no uniform method 
of teaching if individual differences among 
pupils are to be recognized, promoted, and 
preserved. Each teacher was expected to ad- 
just her methods to her pupils. A good teacher, 
then, is one who wisely and carefully chooses 
her methods of instruction. 

It cannot, however, be over-emphasized 
that the measurable pupil changes obtained in 
this study are limited by the type of tests 
applied to the pupils. The use of pupil changes 
as the criterion of teaching ability, depends 
upon the tests applied to the pupils and what- 
ever implications are to be drawn must be 
limited by the tests employed. 


EXPERIMENTAL PATTERN 


The data for this study were obtained dur- 
ing the school year 1936—37 from 28 seventh- 
and eighth-grade classes offering citizenship, 
in non-departmentalized schools. 

To overcome the criticism that measurable 
pupil changes are the result of the efforts of 
a number of teachers, non-departmentalized 
schools were used. In this way, the measur- 
able changes in pupils for a given school year 
could be attributed to a single teacher. This 
was a restriction rigidly enforced when schools 
were asked to cooperate in this study. Conse- 
quently, the number of eligible schools was 
markedly restricted. 

The initial proposal for this study called 
for a representative sample of approximately 
100 non-departmentalized, eighth-grade 
classes, but when attempts were made to carry 
out this plan, it was realized that such a pro- 
gram, because of financial and clerical diffi- 
culties, would be impossible. It was then nec- 


essary to limit the number of participating 
schools to those within a reasonable traveling 
distance from the center of operations, namely 
Madison, Wisconsin. 

The selection of the seventh and eighth 
grades, as the level on which data were to be 
collected, was purely arbitrary. 

A number of schools meeting the above re- 
quirements, as ascertained from the Official 
School Directory of the state of Wisconsin, 
were visited immediately after the opening of 
the school term in the Fall of 1936, and the 
eligible teachers were informed that partici- 
pation in this study was entirely voluntary 
on their part; that there was no compulsion 
of any sort to be applied to them for partici- 
pation, and that the results obtained from 
them and their classes would be treated pri- 
vately and confidentially without transmitting 
such data to their principals and supervisors.* 
The response of the teachers was overwhelm- 
ingly in favor of participation, and a group of 
28 schools located in southern Wisconsin® 
with each school having at least one eighth 
grade, and a total pupil population of 375, was 
selected for participation in this study. ' 

The plan proposed and executed was: (1) 
to measure pupil performance near the begin- 
ning of the school year and near the close of 
the school year so as to obtain long-time pupil 
changes occurring over approximately six 
months; (2) to measure pupil performance 
just prior to the teaching of and immediately 
after the teaching of two three-week units of 
work in the general field of citizenship,—one 
of these units to be given in the fall of the 
school year, the other unit in the spring of 
the same school year,—so that pupil changes 
on two short-units of work would be obtained; 
and (3) various measures and rating-scales 
were applied to the teachers, preferably in the 
fall of the school year, with the exception of 
those tests taken by both teachers and pupils 
which would be given concurrently. 

Several weeks before any teaching of the 
units occurred, each teacher was sent a letter 
in which were stated the dates when testing 
was to begin, a statement of the topics, and 
the general objectives in terms of desired 
goals for which the teachers, teaching the first 


®Mr. L. H. Mathews shared the ility of visiting 
and procuring a large number of the participating schools, of 
administering pupil and teacher tests, and of correcting the 
tests obtained from the schools with which he worked. 

® The location of the 28 schools is shown on a county map 
of the state of Wisconsin (Appendix B—Plate I). (Origi 
=~ = file at the University Library, University of Wis- 
consin. 





8 JOURNAL OF EXPERIMENTAL EDUCATION 


unit, “Safeguarding Public Health”, were to 
strive.*° 
The topics to be included in the first unit 
were: 
. Securing pure air, food, and sunshine. 
. Disposing of wastes. 
. Providing desirable housing. 
. Caring for the physically and mentally 
sick. 
5. Recreational opportunities. 


The goals toward which the teachers were 
to direct their instruction for this unit were: 
(1) “to acquire the kinds and amounts of 
information essential to the understanding of 
the problems and issues involved in safeguard- 
ing public health”; (2) “to develop skill in 
forming judgments about this subject”; (3) 
“to develop desirable attitudes relative to safe- 
guarding public health”; and (4) “to lead the 
pupils, individually and cooperatively, to some 
positive action relative to safeguarding public 
health”. 

Before any teaching was begun on the first 
of the two units, each pupil included in this 
study took the Kuhlmann—Anderson Intelli- 
gence Test, the Traxler Silent Reading Test, 
the Sims Socio-Economic Score Card, and 
a battery of tests consisting of the three 
Wrightstone tests and the three Hill tests." 
At the same time a battery of tests was 
applied to the teachers. 

Several days later, usually at the beginning 
of a school week, the Health test designed to 
measure the work of the first unit, “Safe- 
guarding Public Health”, was given to pupils 
and teachers. 

Teaching on the first unit then continued 
for 13 successive schoo] days and on the 15th 
day, the same test given at the beginning of 
the unit as the pre-test, was again admin- 
istered to the pupils. 

Following the close of the first unit, several 
months intervened in which the teachers re- 
sumed their normal course of study. In the 
Spring of the same school year (early March, 
1937), the participating teachers were in- 
formed that on certain dates they would be 
requested to begin teaching a three-week unit 
on “Community Planning”’. 

As in dealing with the first unit, the par- 
ticipating teachers were sent a list of the 

See Appendix C (Original Thesis on file at the Univer. 


Library, University of Wisconsin) f 
, £GBP PR — ” 


™ The tests applied to both pupils and teachers will be 
described in the following section. 
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topics to be covered and the desired goals of 
instructions to be achieved.’* 

The topics covered in this unit were: (1) 
layout of streets; (2) building zones; (3) 
beautifying the community; (4) keeping the 
community clean; and (5) recreational facil- 
ities. 

The goals of instruction sought for this unit 
were the same as those sought in the first unit 
with the exception of differences in subject 
matter. 


Following the same procedure as in the 
Health unit, the Community Planning test, 
designed to cover the work and goals of the 
second unit was applied to both teachers and 
pupils. The teachers then taught this unit, 
employing whatever methods, materials, sub- 
ject matter, etc., they saw fit to use, for 13 
successive school days. On the 15th day, the 
same test, used at the beginning of this unit 
as an initial test, was again administered to 
the pupils. 

About two weeks after the final test on the 
second unit had been applied to the pupils, 
the pupils were given the same battery of 
tests, three Wrightstone and three Hill tests, 
which they took in the preceding fall. In this 
way, pre- and final-test results, as a longtime 
measure of change, were obtained from the 
pupils. 

As has been remarked, a battery of tests 
was applied to the teachers in the fall of the 
school year. This battery consisted primarily 
of those tests for which there were definite 
time limits to be observed; the teachers were 
requested to complete the other tests at their 
leisure since there were too many tests to be 
taken in a relatively short period of time. 
Later in the year, the same teachers were 
rated by their principals or supervisors on a 
battery of three rating scales; these teachers 
were also rated by the investigators,”* using 
this same battery. 

The following testing schedule of one of the 
participating schools will illustrate the 
sequence of test administration: 


Sept. 29, 1936—Mailed letter to teacher 
stating topics and desired goals for the 
unit, Safeguarding Public Health, and 
informed teacher that on Oct. 13 and 14 

si y ‘Libor 4 Vriversity af Wiscoasia) “tor srertians “oad 
ob; submitted to the teachers. 

Mr. Lee Mathews and the author rated those teachers 

with whom each was working. 
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a battery of tests would be given to her 
pupils; teaching of the unit to begin on 
Oct. 19. 

Oct. 13-14, 1936—Battery. of Kuhimann— 
Anderson Intelligence Test, Traxler Silent 
Reading Test, Sims Socio-Economic 
Score Card, Wrightstone and Hill tests 
applied to pupils by investigator; a num- 
ber of teacher tests also applied at the 
same time. 

Oct. 19, 1936—Health unit test applied by 
investigator to pupils and teacher; this 
test used as pre-test for the first unit. 

Oct. 20-Nov. g (inc.) 1936—Teacher 
taught unit on Health. (Wisconsin State 
Teachers Convention met on Nov. 5, 6, 
and 7 so that no classes were held on 
Nov. 5 and 6). 

Nov. 10, 1936—Investigator repeated 
Health unit test to pupils; this test now 
used as measure of final test status. 

Nov. 23, 1936—Received principal’s rating 
of her eighth-grade teacher on battery of 
three rating scales. 

Mar. 1, 1937—Sent teacher a list of topics 
and goals for the second unit, Commu- 
nity Planning, and advised her that pre- 
test would be given on Mar. 12, and final 
test on Apr. 2. 

Mar. 12, 1937—Investigator applied to 
pupils and teacher the Community Plan- 
ning test as the pre-test for the second 
unit. 

Mar. 15—April 2, 1937—Teacher taught 
second unit, Community Planning. (No 
school on Good Friday, Mar. 26). 

April 2, 1937—Investigator gave unit test 
on Community Planning to pupils. Test 
used as measure of final status for this 
unit. 

April 19, 1937—Battery of Wrightstone 
and Hill tests given to pupils as final test 
to measure long-time changes. 


For each school, the time schedule was 
carefully controlled so that the periods be- 
tween the units and the total time period 
between the initial testing and final testing, 
on the battery consisting of the Wrightstone 
and Hill tests, was constant and equal to 
approximately six months. Since all of these 
tests had to be administered by two investi- 
gators, it was necessary to start the testing 
program in these schools at different intervals. 


The first group started on October 13; the 
second group started on October 21, and a 
third group was started on November 18. 

The units employed were selected because 
work of a similar nature was usually included 
in eighth-grade classes and because the work 
of these units was suggested in the state course 
of study. Teachers were urged to build these 
units as they would any other teaching units 
and not give them extraordinary preparation. 

In the original plan of this study, the deci- 
sion had been made that no teacher should be 
permitted to administer to her pupils any of- 
the tests used. It was decided that the admin- 
istration of tests would be only by the prin- 
cipals or supervisors of the participating 
schools or by the experimenter. Because of 
the lack of sufficient clerical aid, it was found 
necessary, however, to permit teachers, in a 
number of instances, to administer the tests 
to their pupils. The bulk of the testing was, 
nevertheless, not administered by the teach- 
ers. In every case, the number of tests admin- 
istered was carefully checked and collected by 
the experimenter as soon as possible, so that 
no test would remain in the teacher’s posses- 
sion for any length of time. No teacher cor- 
rected any of the tests administered either to 
her or to her pupils. The tests were corrected 
by the two investigators, and test results, giv- 
ing each pupil’s raw score and upper and 
lower quartiles, mean, median, and sigma for 
each test for each class, were submitted to the 
proper teachers. Each teacher was also sent 
the upper and lower quartile points and 
median for each test for the whole pupil group 
used in this study so that if interested, a 
teacher could ascertain how her group com- 
pared to the total pupil population. 

Each class was visited at least five times 
during the course of this study; a large num- 
ber of schools was visited as many as ten and 
twelve times. This was considered to be of 
importance especially since the investigators 
also rated the participating teachers with the 
same teacher rating scales as used by the 
principals or supervisors. 

The design of this study had been carefully 
considered so that no advantage to any class 
might result which would tend to invalidate 
the conclusions made from the data collected. 


PARTICIPATING TEACHERS 


The teachers participating in this study 
varied in a number of respects. From Table I, 
it is perceived that 17 teachers were women 
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TABLE I 
INFORMATION ABOUT THE TEACHERS PARTICIPATING IN THE STUDY 


Teacher Sex Yrs. in 


Age Grades Grades 
(Sept. Taught Used in 
1936) —_ Study 


936) 


oo 


26 


Trap? s 
oo 
— 


hfe Pt 
eo} 


os 
wCnWwa-I00— 


ee 


OP 0 0000 00 CO OO 


oo 


| 


FAVA 
G0 G0 Go Go Ge 


SeetettS 
oo; i | @} 
io de ie =) 
~ 
@ 


| 


I a 
| 


~~ 


3-3 
olioi; | 
eo oo \e SBe sBe s) @ oo 


on — 
AAASCCKHNWNWAG Ook Oa -~1 OOOO 


~ 
| C=] | 


oo 
0200 -300000000000000 01 Comma mm) 


F 
M 
F 
F 
M 
F 
M 
F 
M 
F 
F 
F 
F 
F 


1 


XXVIII 


and 11 were men. The chronological age range 
of this group was 24 years to 54 years with a 
median age of 33 years and an average age 
of 35.4 years. 

The range of total teaching experience was 
from 2 to 31 years with the median at 11 
years and the average at 12.2 years. The 


Present Teaching Institution Profession- 
Location Experience — which al Work 


Additional 


Additional 
Experience 


Total 


raduated 


Co. Norm. 
PM. cokeconas 
St. Norm. 
H.S. 
eee - 

St. Norm. 
St. Norm. 
St. Norm. 
St. 
eae 
St.} 
St.N - z 
St. N . University 
St. sek casei 
St. eae 
St.} é Truck Driver 
St. - _.. Clerking 
St. Norm. as =—hl 

Mist 
College png: Thee WAT oe. 
sennicueiad wih Se aa 
_.. Farming ®@ 

St. Norm. . Bookkeeper 
St.Norm. - Salesman 
St. Norm. erat’ ienwates 
University cio aiaeekiadin 
St. Norm. : Clerking 
St. Norm. University Peete: 
College , Clerking 


EY ee” Farming 
. Clerking 


Construction 
Clerking 


P.O. Clerk 


St. Norm. 
St. Norm. 
University - 


median number of years these teachers had 
taught in the participating schools was 6 years 
while the average was 6.6 years. About five- 
sevenths of this group of teachers were teach- 
ing combined 7th and 8th grades. 

With few exceptions, this group had gradu- 
ated from state normal schools, and several 


TABLE II 
NUMBER OF PUPILS PARTICIPATING IN STUDY BY SCHOOL AND GRADE 


No. of 


Grade Pupils 
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had continued their education by taking addi- 
tional work at the University level. Over half 
of this group indicated that at one time they 
had received incomes from sources other than 
teaching; in a few cases, several of this group 
had, to some degree, been self-supporting 
while in their periods of professional training. 
The median teacher participating in this 
study can be described as a woman about 33 
years old, teaching a combined 7th- and 8th- 
grade class, who had graduated from a three 
year normal school course and who had been 
teaching a total of 11 years, of which the last 
six years had been spent in the participating 
school. She may have had some additional 
experience besides teaching, but had not done 
much to further her own formal education 
after graduating from the normal school. 


obvious that these classes were small, but this 
was due to the fact that the schools used were 
town and district schools since most of the 
larger schools in southern Wisconsin were 
departmentalized. 

Unfortunately, the pupils used in Schools 
X and XVII comprised the combined 7th and 
8th grades in these schools. This was neces- 
sary because the same curriculum in the social 
studies was taught to both of these grades. 
In School XXVI, the 7th-grade curriculum 
was similar to the 8th-grade social studies 
curriculum of the other schools used in this 
experiment. Initially, these schools (X, XVII, 
and XXVI) were intended for final inclusion 
with the others on the condition that there be 
no significant differences in the abilities of 
7th- and 8th-grade pupils to gain on the 
measures used in this study. This condition 


PARTICIPATING PUPILS 


A total of 375 pupils participated in this 
study (Table I1).** The range of the number 
of pupils per class was from 6 to 35 with a 
median of 11 and an average of 13.4. It is 


™“ The original group of 400 was reduced to 375 because 
of absences in class work and because of transfer to other 


schools. 


was subsequently found to be true.*® 


pp. 1 


TABLE III 


% The difference between 
seventh grade pupils and 326 eighth grade pupils on a com- 
posite of unit, Wrightstone and Hill 03 and the 
sums of squares was 10735.27 and 125828. 81 respecti 

The value of ¢ corresponding to this 
which is less than 1.96 the .05 value for ¢. See R. A. Fisher, 

“Statistical Methods for Research Workers” 
Edinburgh, Scotland: Oliver and Boyd 1936), Section 24.1, 

28-133. 


the mean raw score gain of 49 
1 tests, was 4. 
erence was 1. 5 


(6th edition, 


INFORMATION ABOUT THE SCHOOLS PARTICIPATING IN THE STUDY 


Population 
of Town 
1930 Census 
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Principal 
Occupation 
of Region 


Agriculture - 
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Agriculture 
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Agriculture 


Agriculture ___.___. 


Agriculture 
Agric; Mining 
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Agriculture 
A 

anuf; Commer... 
Agriculture 
Agriculture 


Type of 
School 


Elementary 
Elem; HS 
Elem; HS 
Elem; HS 
Elementary 
Elementary 
Elementary 
Elementary 
Elementary 
Elementary 
Elementary 
Elementary 
Elementary 
Elementary 
Elem; HS 
Elementary 
Elementary 
Elementary 
Elementary 
Elem; HS 
Elem; HS 
Elem; HS 
Elem; HS 
Elementary 
Elem; HS 
Elementary 
Elementary 
Elem; HS 


Total Total 
H.S. of Pupils 
Pupils in Bldg. 


Total 
Elem. 
Sch. 


Number 
Teachers 
El. S. HS 
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ScHOOL INFORMATION 

The schools in this experiment, as indicated 
in Table III, represented both urban and rural 
areas. A number of these schools, VI, VIII, 
XI, XVI, XIX, XXIV, XXVII XXVI, and 
were located in what were formerly prosperous 
mining areas; the rest of the schools were 
located in farm and dairy regions. 

The median school could be described as 
an elementary school enrolling 125 pupils 
with a staff of 4 teachers, located in a town 
of about 790 inhabitants, in a rural region of 
southern Wisconsin. 


SECTION II 


DESCRIPTION OF PUPIL AND 
TEACHER MEASURES 


Pupit TESTs 


The purpose of this section is to describe 
the measuring instruments applied to the 
pupils and teachers participating in this 

iment.** 

(1) The intelligence of each pupil used in 
this study was determined by application of 
the Kuhimann—Anderson Intelligence Test, 
Fourth Edition, Grades VII-VIII, 1933.2" 
This group test consists of ten scaled sub- 
tests with the scores on these sub-tests ex- 
pressed as mental ages. The median of these 
mental ages gives the total mental age for 
each pupil which, when divided by the appro- 
priate chronological age, gives the pupil’s 
intelligence quotient. 

The authors of this test, deciding that the 
methods commonly used to determine the reli- 
ability did not adequately apply to psycho- 
logical tests, did not find the reliability of 
their instrument. Miller,** however, in com- 
paring a battery of ten group intelligence 
tests, found-that the Kuhlmann—Anderson 
test, when applied to a number of 7th-grade 
classes, yielded a reliability coefficient of .92. 

This test was used in the present study be- 
cause of its wide use in educational experi- 
ments and because it is designed to test lim- 
ited rather than large ranges of grades. 

(2) Since it had been assumed that pupil 
changes might be conditioned °- the reading 
gant E lege %. “Usee “Origins! Thos "om le at. the 

, University of Wisconsn.) 
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apolis, M 
™ Earl Miler, “A C tive Study of Ten Group Intel- 
ligence Tests on the ith Schoo) Level’, (Unpublished 


. thesis) University of Wisconsin, 1930. 
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ability of pupils, each pupil in this experiment 
was given the Traxler Silent Reading Test, 
Form 1, Grades 7 to 10.'* The three parts of 
the test were designed to measure (a) reading 
rate and story comprehension, (b) vocabulary, 
and (c) paragraph comprehension. 

Reading rate is measured by the number 
of words read in ten seconds, and story com- 
prehension is measured by ten multiple-choice 
questions. The basis for the measurement of 
vocabulary was Thorndike’s “The Teacher’s 
Word Book.” Paragraph comprehension is 
measured by having the pupil read six scaled 
paragraphs and attempting to answer a num- 
ber of questions on each of these paragraphs. 

The reliability of the total test score, 
obtained by intercorrelating duplicate forms 
of tue test, is given as .o1 by Traxler. 

(3) Educators and psychologists have for 
some time been aware that pupils’ school per- 
formance is conditioned by their social envi- 
ronment and economic status. Both Free- 
man*” and Burks* concluded that I.Q. can 
be altered by changes in the home environ- 
ment. Galton in his Hereditary Genius and 
Terman in his Mental and Physical Traits of 
a Thousand Gifted Children found that supe- 
rior children came more often from homes of 
above average social and economic levels than 
from homes below average in these factors. 

Several attempts to measure home environ- 
ment have yielded non-quantitative results.** 
The Sims Score Card for Socio-Economic 
Status, Form C* offers a quantitative measure 
of social and economic status. For this reason 
the Sims Score Card was used in this study. 

The preliminary form of this card was 
applied by Sims to 686 sixth- seventh- and 
eighth-grade pupils representing various social 
and economic levels. Bi-serial correlations 
were found for each item with each of the 
other items of the scale and correlations be- 
tween each item with the criterion scores, 
which consisted of the average score of all the 
or items In its final form this card 
Arthur Traxler, The Tvasler Silent Reading Test (Bloom- 
WF Enema, ti Fog 
of Foster Children”, Nat et Sits forthe ‘Sindy of Bde 
on. igh mg yy FE of Nature and 


Nurture upon tal t”, National for 
the Study of, of Education, 27th Yearbook, Part 1, ists tas. 


these 8 attempts may be listed: J. H. —, 
Whittier Scal or Grading Home Conditions, Builetin No. 


(Whittier, 

33 Verner M. 
Status (B 
1928). 


Whittier te School). 
“sim, The “Measrement of Sacie Economie 
Ill.: Public School Publishing Co. 
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consists of 23 items having high correlations 
with the criterion but low intercorrelations 
with each other. 

Using a group of 100 paired siblings, Sims 
found this scale to have a reliability of .94."* 
For a group of 72 paired siblings,** the ex- 

imenter of the present study found a 
coefficient of reliability of .84 by use of Fur- 
fey’s intraclass correlational method.** 

According to the plan of this study, objec- 
tives other than information were to be 
measured. Results were sought for such goals 
as skill in forming judgments and desirable 
changes in attitudes. Very fortunately, several 
social study tests devised by Wrightstone un- 
der the auspices of the Progressive Education 
Association were available. Upon the basis 
of desired objectives submitted by a group of 
teachers engaged in teaching an experimental 
curriculum, Wrightstone had devised a bat- 
tery of five tests for the social studies field.*’ 
From this battery, three tests, “Applying Gen- 
eralizations to Social] Studies Events,” “Abil- 
ities to Organize Research Materials,” and 
“A Scale of Civic Beliefs,” were administered 


twice to the pupils in this experiment; the first. 


giving of these tests was prior to the teaching 
of the first three-week unit of work.** 

(4) Applying Generalizations to Social- 
Studies Events was constructed on the basis 
of a list of generalizations which a group of 
teachers expected their pupils to reach. This 
list was checked for validity against news- 
paper articles, textbooks, and reference mate- 
rials used in the social studies in classrooms. 
For this test, Wrightstone reports for grades 
X-XII, inclusive, a coefficient of reliability of 
.92 obtained by applying the Spearman—Brown 
prophecy formula to the correlation of scores 
on odd-even halves of the test.*° 

(5) Abilities to Organize Research Mate- 
rials, Revised Form, consists of five parts: 
Part I, Ability to Recognize a Suitable Topic 
for Research; Part II, Ability to Separate 
Irrelevant from Relevant Material; Part III, 
Ability to Sense Logicality; Part IV, Ability 

Pr pp. 33. 


urished by it —- a number of siblings from data 
r. Gotham, Mr. G. E. Carlson, and 
i ~ Sy ion a series of experiments which they 


=P. H. Furfey “A Formula for Correlating Interchange- 
able Variahies Journal of Educational Psychology, XVIII 
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J. Wayne Wrightstone, 
tives of the Social Studies,” 
pp. 771-779. 
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“Measuring Some Major Objec- 
School Review, XL (1935), 


to Co-ordinate and Subordinate Appropriate 
Data; and Part V, Ability to Organize an 
Outline. The items for this test were de- 
rived from the instructional outcomes desired 
by the teachers, and from the type of activ- 
ities engaged in by pupils. For this test, 
Wrightstone reports for grades X—XII, inclu- 
sive, a reliability of .88 obtained by applying 
the Spearman—Brown prophecy formula to the 
correlation of odd-even test scores.*° 

(6) The Scale of Civic Beliefs is an attempt 
to measure civic attitudes and beliefs with 
regard to racial attitudes, international atti- 
tudes, national political attitudes, and national 
achievements. This test consists of 80 true- 
false items. Wrightstone checked these items 
for liberalism against editorials in such maga- 
zines as the Nation and the New Republic 
and also obtained opinions from a group of 
social scientists as to the liberalism or con- 
servatism associated with these items. Upon 
applying the Spearman—Brown prophecy 
formula to the correlation of odd-even scores, 
Wrightstone found this test to give a coeffi- 
cient of reliability of .94 for grades X—XII.™ 

For these tests, Wrightstone reported the 
following intercorrelations:** 


1 


1 Applying Generalizations... 1.00 
2 Abilities to Organize Research _- 
3 Scale of Civic Beliefs 


The intercorrelations obtained for the same 
tests used as initial tests for the total pupil 
population of 375 used in this study were: 


1 


1 Applying Generalizations... 1.00 
2 Abilities to Organize Research--_-- 
8 Scale of Civic Beliefs 


The small size of the intercorrelations given 
above indicate that this battery of tests 
measures, in spite of a small degree of over- 
lapping, different functions. 

A battery consisting of: 


(7) Hill, Test in Civic Attitudes 
(8) Hill, Test in Civic Information 
(9) Hill—-Wilson, Test in Civic Action®™ 
was given to each pupil prior to the teaching 
of the first three-week unit and following the 
completion of the second three-week unit of 
® Ibid., p. 776. 
™ Ibid., p. 776. 
© Ibid., p. 778. 


% These tests are published by the Public School Publish- 
ing Co., Bloomington, Illinois. 
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work. Each of these tests consists of 20 
multiple-choice items with each item per- 
mitting one correct answer from five possible 
choices, The authors of this battery report 
that the items used were selected upon the 
bases of adult experiences, courses of study, 
civics, text-books, opinionS of classroom 
teachers and subject experts, and upon try- 
out with junior and senior high-school pupils. 
No report on the reliability of these tests, 
however, is given by the authors. 

Analysis of the items of these three tests 
seemed to indicate that in spite of the titles 
assigned to two of these tests, information was 
measured to a large degree by all of them. 
The intercorrelations of these three tests when 
applied as initial tests for the population of 
375 pupils included in this study, gave the 
following results: 

1 2 3 


. 1.00 .49 .43 
1.00 .50 
«7 Soe 


1. Hill, Civic Attitudes_- 
2. Hill, Civic Information_ -.- 
3. . Hill-Wilson, Civie Action. 


That these intercorrelations are low is prob- 
ably due to the low reliability of the indi- 
vidual tests rather than to the differences in 
objectives measured. 

For measuring the outcomes from the 
teaching of the two three-week units, Safe- 
guarding Public Health and Community Plan- 
ning two objective tests were constructed. The 
Health test was applied to the pupils imme- 
diately before and after they had studied the 
first unit and the test on Community Plan- 
ning was similarly applied immediately before 
and after the teacher had taught the second 
of the three-week units. 

(10) The Health Test, Form A,* consisted 
of 48 true-false items, 11 matching items, and 
24 multiple choice items with each of the 
latter items permitting of one correct answer 
from five possible choices. While it had been 
hoped to devise this test to measure objectives 
other than information, careful inspection in- 
dicates that this test measures primarily in- 
formational outcomes. 

(11) The Test on Community Planning, 
Form A, consisted of 48 true-false items, 14 
multiple-choice items each having five pos- 
sible choices, 14 matching questions, and 
several items designed to measure pupil 
action. It was hoped that this test would 


% The original form of the test devised by Mr. C. L. 
Daggett a a nm in educational measurement at the State 
——. Whitewater, Wisconsin, was modified for 


in aS, 
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measure objectives other than information by 
including items similar to the following: 
“Despite the fact that cities have street 
cleaning equipment, each individual in a 
city should feel responsible in helping to 
keep the streets clean. 
School pupils ought to have the right to 
throw candy wrappers where they please. 
Since most hospitals use coal, they 
should be located near industrial areas. 
Approximately how many discussions 
concerning Community Planning have you 
had with persons outside of your school 
during the past two weeks?” 


The test items used in these unit tests were 
carefully checked for validity with the text- 
books and reference materials most often used 
by the 8th-grade teachers in teaching health 
and community planning. 

These two unit tests with an intercorrela- 
tion of .53 as measures of initia] status are 
characteristic of the type of objective tests 
designed by teachers to measure pupil per- 
formance. 

For each of the pupil tests, except for the 
Kuhlmann—Anderson [Intelligence Test, the 
Traxler Silent Reading Test, and the Sims 
Socio-Economic Score Card, it was possible to 
obtain three coefficients of reliability: (1) for 
the test as a measure of initial status; (2) for 
the test as a measure of final status; and 
(3) for the test as a measure of pupil change. 
The reliabilities of initial and final status were 
obtained by taking a random sample of 125 
cases from the total pupil group of 375 cases, 
dividing the initial and final tests into odd and 
even halves, correlating these halves for the 
proper test, and correcting the obtained coeffi- 
cients of reliability by use of the Spearman— 
Brown formula. 

The coefficients of reliability for these tests 
used as measures of pupil change were calcu- 
lated by subtracting, for each corresponding 
pair of initial and final pupil tests, the odd 
score on the initial test from the odd score 
on the corresponding final test. This gave, for 
each test so considered, a new series of odd 
changes and a new series of even changes 
which, when correlated and “stepped up” by 
the Spearman—Brown formula, gave the coeffi- 
cient of reliability of the test used as a 
measure of pupil change.*° 

A summary of the tests given to pupils is given in 
Table V. A. 6. Hellfritzsch offered valuable advice on the 
statistical aspects of this study. Assistance in scoring the 


tests and making elementary statistical analyses was sup- 
plied by the Works Progress Administration. 
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TaBLe IV 
COEFFICIENTS OF RELIABILITY OF PUPIL TESTS 


(Raw Score) 
N=125 


Health—Unit I 
aes 
Initial - 
Chang’ oan 

Community Planning—Unit II 
Final ; 

Initial 
Change - - 

Wrightstone— Abilities to Organize Rese arch | 
Materials 
Final 
Initial 
Change 

Wrightstone 
Final 
Initial _ . 
Change volaiies 

Wrightstone- Applying Generalizations to 

Social Studie os 
Final 


Scale of Civic Beliefs 


Events 


Initial 
Change ‘ 
Test in Civic Attitudes 
Final 
Initial 
Change 
Test in Civic Information 
‘inal 
Initial 
Change 
Hill-Wilson 
Final _ _- 
Initial _ - 
Change 
*These tests 
of the final composite change score was .76. 


Hili 


Hill 


Test in Civic Action 


Table IV indicates the coefficients of reli- 
ability for the unit, Wrightstone, and Hill 
tests (expressed in raw score units) as 
measures of final status, initial status, and 
pupil change, obtained from a constant 
sample of 125 cases. The mean and standard 
deviation for each odd and even half used in 
obtaining these correlations are also indicated. 

The coefficients of reliability of tests used 
as measures of pupil change should be inter- 
preted in the same manner as are the coeffi- 
cients of reliability of the initial and final 
tests. That the reliability coefficients for tests 
used to measure pupil change are lower than 
those obtained from the initial and final pupil 
test status, is probably due to the fact that 
the reliability coefficients of pupil change con- 
tain the errors of measurement of both the 
initial and final applications of the tests used. 


Mn 


. 58 R 
.18 . 55 : .13 
-41 41 


. 58 
. 62 51 
. 96 


. 92 91 
51.31 . 92 
. 82 


50.10 « . 50 
5. 88 
3. 30 


24. 60 
22. 63 
. 97 5. 18 


. 36 
5. 74 . 53 
. 64 74 


.43 6. 1 
.81 . 61 4. a 
. 62 1 1 


. 55 1. 87 6. 1.70 
15 e 
. 43 1. 


were weigh ted in the final criterion in proportion to their reliabilities. The reliability 


Odd Half 

8. D. 
. 87 - 75 

. 54 

72 


. 03 
. 88 ‘ 17 


. 92 


. 09 
. 22 
. 00 


. 20 
- 45 
- 62 


. 36 


. 30 
41 


. 88 
. 49 
. 93 


-47 
. 66 
.49 


“Te 
65 
. 87 


. 25 
. 66 


- 65 


. 62 
. 92 


86 6. 1. 57 


93 0. 42 1.74 : 148* 


TEACHER TESTS*® 


(1) The American Council on Education 
Psychological Examination for College Fresh- 
men, 1936 edition,** (by L. L. Thurstone and 
Thelma Gwin Thurstone) was applied to each 
teacher cooperating in the experiment. This 
test consists of five parts: (1) Completion 
made up of 40 sentences from each of which 
one word is missing. The subject is to supply 
the correct missing word wherever such an 
omission is indicated; (2) Arithmetic made 
up of 20 problems which are intended to cover 
the principal divisions of the field of arith- 
metic; (3) in the section on Artificial Lan- 
guage the subject is given a list of ten artifi- 


* Sample copies of all ote and mites eobee core © 
the teachers will be found in Appendix E of Original 
on file in the University Library, University of Wisconsin. 
8" Published 2 3 the American Council on Education, 
Washington, D. C 
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TABLE V 
SUMMARY OF TESTS APPLIED TO PUPILS 


Test 
Kuhlmann-Anderson Intelligence Test— 


Traxler Silent Reading Test—Grades VII-X 

Sims Score Card for Socio-Economic Status 

Health 

Community Planning 

Wrightstone—Abilities To Organize Research 

aterials 

Wrightstone—Scale of Civic Beliefs 

Wrightstone—Applying Generalizations To Social 
Studies Events 

Hill—Test in Civic Attitudes 

Hill—Test in Civic Information 

Hill-Wilson—Test in Civic Action 


cial words and rules for the formation of 
plural number, past and future tenses, nouns, 
adjectives, and adverbs, which are to be used 
to translate 30 sentences, some of which are 
translated from the English to the artificial 
language, while others are translated from the 
artificial language to the English; (4) Ana- 
logies consists of 29 sets of geometric figures. 
For each set, 2 geometric figures are given for 
which the subject is to perceive a relationship. 
For a given third figure the subject selects 

f one of five possible choices that bear the same 
relationship to the third as that of the first 
two figures; and (5) Opposites consists of a 
list of 33 sets of four words in each set. Two 
of the words in each set have either opposite 
or the same meanings, and the task of the 
subject is to indicate which two words are 
opposite or the same in meaning. 

This test was selected because of its wide- 
spread use in educational research and be- 
cause of its high reliability (.93 — .98). 

(2) The participating teachers were also 
subjected to a second psychological test, The 
Teachers College Psychological Examination, 
1934 Edition,** which was devised by repre- 
sentatives of several midwestern teacher- 
training institutions. 

This examination consists of six parts: (1) 
for Vocabulary are listed 80 key words each 
of which is accompanied by five other words. 
The subject is to select from each group of 
five words that word which in meaning, is 
most nearly like the key word; (2) the test 
on Number Series consists of 25 rows of num- 
bers, each row being made up by some com- 
bination of numbers, such as 3, 6, 9, 12, 15, 

Published by the State Teachers College, St. Cloud, 
Minnesota. 


Maximum 


Approximate 
Possible 


Working Time 
In Minutes 


Form 


18. For each row, the subject is to ascertain 
the combination used and then extend the 
given series by two more numbers; (3) the 
Same-—O pposite test consists of 25 rows of four 
words each. For each of these rows, the sub- 
ject is to indicate which two words are oppo- 
site in meaning or which two words are alike 
in meaning. (4) Arithmetic Reasoning con- 
sists of 15 problems in arithmetic which the 
subject is to solve; (5) the Completion Test 
is made up of 35 sentences from each of 
which one missing word is to be supplied by 
the subject; and, (6) Analogies consists of 35 
rows of words, geometric figures, or numbers. 
For each row, the subject is to perceive the 
relationship between two given signs and is 
to select from 5 possible choices that word, 
figure, or number which will bear a similar 
relationship to a given sign as the first two 
given conditions bore between themselves. 

On the basis of total raw scores, a reliability 
of .95 was reported. Since this test was stand- 
ardized on teacher college populations and 
since most of our teachers were graduates of 
such teacher training schools, it was thought 
advisable to include this test in the battery 
given to the teachers. 

(3) The American Council Civics and 
Government Test, Form B,** (by Robert D. 
Leigh and others), which is primarily a test 
of information in American civics and gov- 
ernment was also applied to each teacher. 
For this test a reliability of .82 for 126 cases 
is reported. 

(4) In order to obtain the personal judg- 
ment of the participating teachers on current 
social issues and problems, Parts I and III of 

® Published by World Book Co., Yonkers-on-Hudson, N. Y. 
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the test on Social Attitudes of Secondary 
School Teachers,” was applied. 

Part I, consisting of 106 true-false test 
items concerned with controversial issues call- 
ing for answers dependent upon judgment and 
opinion, was based upon concepts found pri- 
marily in “high grade journals of opinion.’ 

Part III, Public Problems Information Test, 
consisted of 100 true-false items designed to 
measure social science information and 
knowledge of recent or current national affairs. 
The key for this test was derived from judges’ 
opinions which represented liberal and con- 
servative tendencies on the issues covered. 

By use of the “split-half’’ method, reliabil- 
ity coefficients of .94 for Part I and .93 for 
Part III were obtained from a random sample 
representing approximately 3700 public sec- 
ondary school teachers in both urban and 
rural areas in al] but five of the 48 states. 

(5) A Scale for Measuring Attitude To- 
ward Teachers and the Teaching Profession™ 
(by Tressa C. Yeager) was administered to 
each cooperating teacher. This scale was con- 
structed by obtaining, from 198 high-school 
seniors, a list of 154 statements on attitudes 
towards teaching and opinions on teachers. 
A group of 301 persons, in various professions 
and occupations sorted this list of 154 state- 
ments into eleven piles which represented a 
range from the highest to the lowest apprecia- 
tive attitudes. On the basis of these sortings, 
scale values were assigned to each of the state- 
ments. The final scale was devised from the 
responses of 331 high school seniors to whom 
the list of 154 statements were submitted. 
Comparisons made between a group of seniors 
who had indicated a vocational preference for 
teaching with other groups which had indi- 
cated non-teaching vocational preferences in- 
dicated the performance of the teaching pref- 
erence group on this scale to be superior to 
that of the non-teaching preference groups. 

Splitting this scale in half, Yeager reported 
a “stepped-up” correlation of reliability of .88 
for 100 cases. 

(6) The Morris Trait Index L® (by Eliza- 
beth H. Morris) was designed to measure the 
trait of leadership as defined by Miss Morris. 


“ Described in “The Teacher and Societ; tg Year- 
book of the Jokn Dewey Society (New Yox:’ J 
Society, fst. Ch. VIII. Test used by an of Dr. 


a Trees eo} Yeager eager, An Analysis of Certain Traits of 
Selected High School Seniors Interested in Teaching, Con- 
tributions to Education, No. 660 (New York: Bureau of 
Publications, Teachers College, Columbia University, 1935), 


pp. 87. 
oe Published by Public School Publishing Co., Bloomington, 
inois. 


This test consists of five sections: (1) Sec- 
tion I is composed of 38 items, such as “study- 
ing”, “reading”, “having responsibility” and 
so on, for which the subject is to indicate for 
each item one of five degrees of feelings; 
(2) Section II consists of 14 comments often 
made by teachers to pupils, such as, “Merely 
satisfactory work isn’t enough.” For each of 
these statements, the subject is to indicate to 
which type of pupil (bright, dull, careless, 
lazy, bluffers, conscientious) the comment is 
appropriate; (3) Section III lists 15 different 
classroom situations, each of which is to be 
interpreted as being either amusing, embar- 
rassing, necessitating firm control, interesting, 
or necessitating correction of mistake; (4) 
Section IV is made up of 33 statements in- 
volving personal opinions each of which is 
answered by one response from a five point 
scale ranging from “always true” to “never 
true;” and (5) Section V consists of 7 
multiple choice items which present situa- 
tions for which the subject is to indicate his 
attitude. 

Data on the original test, of which the 
above described is an adaptation, was obtained 
from a group of 754 persons, which included 
teachers rated as strong or weak by their prin- 
cipals and a group of 402 students from a 
state teachers college, among which were 178 
seniors. The scoring on this test was devel- 
oped by comparing the responses of strong 
and weak teachers and also by ascertaining 
the degree of success in practice teaching of 
some of the teachers-in-training. 

The object of this index as stated by Morris 
was, “to develop a measure of other than in- 
tellectual qualities which contribute signifi- 
cantly to the success of prospective high 
school teachers.” * 

(7) The Orientation Test Concerning 
Fundamental Aims of Education, 1935 Revi- 
sion,** (by Alfred S. Leverenz and Harry C. 
Steinmetz) consists of 475 true-false items 
that measure the teacher’s knowledge and be- 
liefs in the seven areas of human experience 
corresponding to the seven cardinal objec- 
tives of education. The items are either true 
or doubtful. Among the latter items, however, 
are many that are accepted as true by persons 
having personal prejudices, provincial loyal- 
ties, or common superstitions based upon folk- 


“Elizabeth H. Morris, Personal Traits and Success in 
Tcaching, Contributions to Education, No. 342 (New York: 
Bureau of Publications, Teachers College, Columbia Univer- 
sity, 1929), p. 32. 

“ Published | the Southern — School Book De- 
pository, Ltd., Los Angeles, California 
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lore. According to the test manual, dogmatic 
and superstitious persons receive low scores 
while persons possessing a scientific outlook 
and an open mind receive high scores on this 
test. Many of the statements require consid- 
erable knowledge in the field, if even an open- 
minded person is to do more than guess 
whether the item is true or not; e.g., “Nitro- 
gen is necessary for plant growth;” “German 
silver is a mixture of silver and nickel;” or 
“The book ‘Mother India’ presents a true pic- 
ture of social life in India.” The total score 
used here is the average percentile rank of 
the nine subtests. For the nine subtests, the 
authors reported a coefficient of reliability of 
89 for 152 cases. 

(8) The Personality Inventory* (by 
Robert G. Bernreuter) was used in this 
experiment for the purpose of obtaining 
objective values for the traits measured by 
this inventory. 

In the construction of the Jnventory, Bern- 
reuter had recourse to four previously con- 
structed tests: The T/urstone Neurotic Inven- 
tory; the Laird Test for Introversion, Sched- 
ule C; The Allport Ascendance-Submission 
_ Test; and the Bernreuter Self-Sufficiency 
Test. The Inventory consisting of 125 items, 
each of which can be scored Yes, No, or ?, 
may be scored on four traits: neurotic tend- 
ency, (B,—N); self-sufficiency, (B,—S) ; intro- 
version-extraversion, (B,—J); and dominance- 
submission (B,—D). Intercorrelations between 
scores on these traits have demonstrated 
sufficient overlapping between B,—N and 
B,-I, so that usually scoring is confined to 
traits B,—N, B,-S, and B,-D.** 

Using Hotelling’s method of factor analysis, 
Flanagan has added two more scales to the 
scoring of this Inventory;*’ these scales Flan- 
agan calls self-confidence (F,—C) and socia- 
bility (F,-S) but fails to define them. 
Flanagan has claimed that these two scales 
could for all practical purposes replace the 
four scales obtained by Bernreuter.** 

For the four scales, Bernreuter reports a 
range of reliabilities from .85 to .92. The reli- 
“ Published by Stanford University Press, Stanford Uni- 
versity, California. 

ae for the Personality Inven- 
has also been substantiated by Ross 
X y a. ey of the Bernreuter Ps. 
sonality Inventory,” Journal of Abnormal and Social 
chology, XXVIII (Jan~March, 1934), pp. 413-418; and by 
Eve Gee) care sae Eos 
Psychology, XXVIII (1937), pp. $30-540. 


ohn Flanagan, ‘Factor Analysis in 
Peroneiity * (1935), Stanford University, California 
«* Ibid., p. 73. 


. Bernreuter, 


So Sualy of 
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ability for F,-C is given as .86; for F,-S 
as .78.*° Bernreuter, in the manual for the 
Inventory, reports the coefficients of validity, 
of his four scales with the criterion, made up 
of the four tests from which most of the items 
were drawn, as ranging from .84 to 1.00. Such 
high correlations can, of course, be expected 
since both correlates are highly saturated with 
identical elements. Lorge®® and Flanagan™ 
have indicated, however, that the validity of 
the Inventory was still an unsettled matter. 


(9) The Social Adjustment Inventory 
(Sapich Edition) ,°* (by J. N. Washburne) 
consisting of 123 items some of which call for 
more than one response, was designed to 
measure the traits of truthfulness, sympathy, 
alienation, purpose, impulse-judgment, con- 
trol, happiness, and wish. 


This Inventory was standardized from the 
scores of four groups: (a) public school chil- 
dren divided upon teachers’ and principal’s 
estimates of exceptionally good or poor ad- 
justments; (b) public school children divided 
upon bases of good or poor deportment marks; 
(c) feebleminded boys and girls having made 
favorable or unfavorable adjustments; and, 
(d) a group of prisoners. By comparing the 
responses of these groups it was ascertained 
that the best adjusted individuals made the 
highest adjustment scores on the Inventory, 
the next best adjusted group made the second 
best adjustment scores, etc. 


The reliability of the total adjustment score 
was found by Washburne to be .go; the inter- 
correlations between traits was found to be 
negligible and the validity of the Inventory, 
obtained by correlating scores obtained by 
prisoners and by well-adjusted individuals 
(bi-serials) was found to be .go.”* 

(10) The Stanford Educational Aptitudes 
Test,** (by Milton B. Jensen) is composed of 
three sections: (1) Position Preference Rat- 
ings; (2) Discipline Case Problems; and 
(3) High School Activities. 

: R. G. Bernreuter, “Manual for the Personality Inven- 
ory.’ 


“Personality Traits by Fiat II: 
‘ational 


tim Journal " of Educ Psychology, XXVI fies3), 


pp. 652-54. 
John C. Flanagan, “Technical of Multi-Trait 
Tests,” Journal of Educational Psychology, XXVI (1935), 


pp. 641-51. 
me ad J. N. Washburne, Syracuse University, 


Syracuse, 
sj. A, bo “A Tesi Social Adjustment,” 
Journal of Applied Psychology, xx. (1935), pp. 125-144. 
% Published by Stanford University Press, Stanford Univer- 
sity, California. 
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This test was constructed® by having seven 
university professors of education rate a group 
of 582 persons, who composed the member- 
ship of the American Educational Research 
Association, members of Phi Delta Kappa 
fraternity at Stanford University, and a list 
of city school superintendents, upon their 
abilities as administrators, teachers, and re- 
search workers. To the groups so rated, con- 
taining a total of 307 cases comprising the 
upper and lower 27% in each group, were 
addressed personal report blanks which were 
filled out and returned by 205 persons. On 
the basis of these replies, three scales were 
constructed to measure differences between 
teaching and research abilities, administrative 
and research abilities and teaching and admin- 
istrative abilities. The coefficients of reliability 
of the three scales were obtained from scores 
of persons not rated originally by the judges, 
and were used to obtain weights for the vari- 
ous test items while the validity of the scales 
was obtained by analyzing the responses made 
from the completed personal report blanks. 

For the three traits measured by this test, 
the author reported reliability coefficients of 
.85, .94 and .g1 for the 7—R, A—R, and T—A 
abilities respectively.°* 

(11) The Test of Teaching Problems* (by 

. L. Torgerson) consists of two parts: Part I 
is composed of 16 teaching problems for which 
the subject indicates, from a supplementary 
list of possible solutions, those procedures 
that would be used in correcting the pupil 
behavior involved; Part II consists of a list 
of common teaching processes or practices, 
such as, “Give the same assignment to all 
pupils,” and “Visit pupil’s home.” The degree 
to which the teachers follow these practices 
is indicated on a five point scale (from 
“always” to “never”). 

This test was used in order to obtain some 
measure of the teaching practices employed 
by the co-operating teachers. 

(12) The Theory and Practice of Mental 
Hygiene™ (by T. L. Torgerson) was designed 
to obtain diagnostic cues as to whether teach- 
ers could differentiate between causes and 
symptoms of pupil maladjustment, as to how 


® Milton B. Jensen, “Objective Differences Between Three 
Groups in Education (Teachers, Ri and Ad- 
a Genetic Psychology Monographs, Ul (1928), 
a & Manual ‘of Directions for Stanford Educational Aptitudes 

School of Education, University of Wisconsin, Madison, 
Wisconsin (mimeographed). 

"School of Education, University of Wisconsin, Madison, 


Wisconsin (mimeographed) 


teachers would go about correcting maladjust- 
ments if the causes were known, as to whether 
teachers could differentiate between behavior 
patterns as demonstrated by pupil action, and 
also what disciplinary procedures are used by 
teachers. 

This test is composed of four parts. In the 
first part the subject is given a list of 40 
statements such as, “Carelessness in school 
work,” and “unnecessary tardiness” for which 
the teacher indicates whether these statements 
are causes or symptoms of pupil maladjust- 
ment. In the second part, 47 common symp- 
toms of pupil maladjustment are listed for 
which the subject is to indicate the proper 
remedial procedure. Eleven specific pupil be- 
havior situations are listed in the third part. 
For each of these, the subject is to supply, 
from a list of 13 behavior patterns, that pat- 
tern which most closely conditions the specific 
pupil behavior. The fourth part consists of a 
list of 34 commonly employed disciplinary 
procedures from which the subject is to check 
the one most frequently used. 

The following tests: 

13. Abilities to Organize Research Mate- 

rial (by J. W. Wrightstone) 

14. Test on Community Planning, Form A 

15. Safeguarding Public Health, Form A 


which were described in the preceding section, 
were also taken by the participating teachers. 


RATING SCALES 


The teaching ability of the teachers in this 
experiment was rated by their principals or 
supervisors and by the experimenters on a 
battery of three rating scales. 

(1) Almy—Sorenson Rating Scale for 
Teachers® (by H. C. Almy and Herbert 
Sorenson) consisting of 20 traits was devised 
from a list of traits submitted by 77 persons 
engaged in educational work. Each of the 20 
traits is divided into 10 points. Some of the 
traits to be rated are those of resourcefulness, 
enthusiasm, leadership, co-operation, etc. The 
authors of this rating scale report a coefficient 
of reliability of .92 for two ratings by the 
same raters from 110 practice teachers. 

(2) The Michigan Educational Association 
Teacher Rating Scale® (by the Michigan 
Education Association) consists of 10 traits 
nae er biiahed by the Public School Publishing Co., Bloom- 

- : : ioe 
Pele by the Michigan Educational Association, 
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each of which is divided into a number of 
sub-items for which ratings on a 5 point scale 
(“very inferior” to “very superior”) are pos- 
sible. The ratings are numerically 
and the total of the assigned values is used to 
interpret the teaching skill of the teacher. 
(3) The Diagnostic Teacher Rating Scale 
of Instructional Activities™ (by T. L. Torger- 
son) consists of 16 traits each of which has 
five parts. Each of these parts consists of a 
statement describing the classroom activity of 
the teacher being rated. The observer checks 
those activities most closely describing the 
teacher’s performance and the sum of the 


a :* ee by the Public School Publishing Co., Blooming- 
ton, b 
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values assigned to these statements is used to 
ascertain the teaching ability of the teacher 
rated. The coefficients of reliability obtained 
from two ratings by the same judges of twof 
groups of teachers are given by Torgerson as 
.86 and .89. 


Barr, Torgerson et al®* found that each of 
these scales when applied twice by the same 
superintendents to the same teacher gave the 
following coefficients of reliability: 
Almy-Sorenson Rating Scale 
Michigan Teacher Rating Card 
Torgerson Diagnostic Teacher Rating Scale__. 

© Barr, Torgerson, et al., The Measurement of Teaching 


, M 
Efficiency, (New York: The MacMillan Co., 1935), p. 87. 


TABLE Via 


COEFFICIENTS OF RELIABILITY OF TEACHER TESTS 
N=28 


(Participating Group) 


Test 

American Council Psychological Examination 
Teachers College Psychological Examination _- 
American Council Civics and Government Test 


Social Attitudes of Secondary School Teachers. ---_-_- 


Yeager—Scale For Measuring Attitudes._.______- 


Morris Trait Index—L 
Lewerenz-Steinmetz—Orientation Test__ 
Bernreuter—Personality Inventory 


Health—Unit I. ..............- ; 
Community Planning—Unit IT 


Wrightstone—Abilities to Organize Research - Sn : aE ae 
*Editor’s Note: The exceedingly low reliabilities here reported, compared with those in Table VIb 
would lead one to suspect some sort of error in administering or scoring these tests. 


TABLE VIb 
COEFFICIENTS OF RELIABILITY OF TEACHER TESTS 
(Combined Groups) 


Test 


American Council Psychological Examination__-_--_- 


Teacher College Psychological Examination....__....__.._._._._______. 


American Council 
Social Attitudes of Secondary School Teachers 
Yeager—Scale for Measuring Attitude 
Morris Trait Index—L 
Lewerenz-Steinmetz—Orientation Test 
Bernreuter—Personality Inventory 


TENE ym ED Ag as" I bee a 


ivics and Government Test__.....______- 
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The coefficients of reliability for most of the 
tests applied to the 28 participating teachers 
are presented in Table VIa. These coefficients 
were obtained by correlating the split halves 
of the test results and stepping up the 
obtained correlations by the Spearman—Brown 
prophecy formula. 


Test results for an additional group of 
teachers, similar in training, experience, age, 


grades taught etc., to the 28 participating 


teachers, were also ‘available. This group had 
taken, with several exceptions, the same bat- 
tery of tests which had been administered to 


the participating teachers. Since the coeffi- 


TABLE VII 
TESTS AND RATING SCALES APPLIED TO TEACHERS 


(Teacher Tests) 


Name of Test or Rating Scale 


American Council Psychological Examination 


Teachers College Psychological Examination 


American Council Civics and Government Test 


Social Attitudes of Secondary-School Teachers 


Health Test—Unit I 
Test on Community Planning 


Wrightstone—Abilities to Organize 
Research Material 


Yeager—Scale For yy | Attitude 
Towards Teachers and the Teaching 
Profession 


Torgerson—Theory and Practice of 

Mental Hygiene 
Torgerson—Teaching Problems (Mimeo.) 
The Bernreuter Personality Inventory 
Morris Trait Index—L 
Washburne Social Adjustment Inventory 
Stanford Educational Aptitudes Test 
Lewerenz-Steinmetz—Orientation Test 
Almy-Sorenson—Rating Scale for Teachers 
Torgerson Diagnostic Rating Scale of 


Instructional Activities 
Michigan Teacher Rating Scale 


Publisher 


American Council on Education 
Washington, D 


State Teachers College 
St. Cloud, Minnesota 


World Book ot mony A 
Yonkers-on-Hudson, Y 


John Dewey Society 
New York, New York 


University of Wisconsin 
Madison, Wisconsin 


University of Wisconsin 
Madison, Wisconsin 


Teachers College 
Columbia University 
New York, New York 


Teachers College 
Coluntbia University 
New York, New York 


University of Wisconsin 
Madison, Wisconsin 


University of Wisconsin 
Madison, Wisconsin 


Stanford University Press 
Stanford University, California 


Public School Publishing Company 
Bloomington, Illinois 


Syracuse University 
Syracuse, New York 


Stanford University Press 
Stanford University, California 


Southern California School Book 
Depository, Los Angeles, California 


Public School Publishing Company 
Bloomington, Illinois 


Public School Publishing Company 
Bloomington, Illinois 

Michigan Education Association, 
Lansing, Michigan 


Edition 
or Form 


1936 


1934 


Form B 
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cients of reliability for a larger sample should 
be more valid the test results of both groups 
were combined in order to obtain new coeffi- 
cients of correlation.“ The coefficients, ob- 
tained in the manner described above, are 
listed in Table VIb. 

Examination of Tables VIa and VIb show 
that in most instances the coefficients of reli- 
ability calculated from the combined groups 
are larger in magnitude than those calculated 
from the group of participating teachers. 

A list of the tests and rating scales applied 
to the teachers is given in Table VII. 


SECTION III 


ESTABLISHMENT OF SEVERAL CRI- 
TERIA OF TEACHING ABILITY 


The first step, in building a criterion score 
of teaching ability for each teacher, consisted 
in calculating the mean and standard devia- 
tion®* of the final, initial, and change scores 
of the pupils in each class for each of the 
eight pupil tests. 

Each set of 28 change score means derived 
for the pupils of the 28 classes represents the 
extent to which the goals of education as 
measured by that test were attained during 
the interval between the initial and final giv- 
ing of the test. Each set of 28 means can be 
used as a set of indices of teaching ability. 
However, in view of the fact that these eight 
separate sets of scores are probably not unre- 
lated and since the reliability of some of these 
tests taken singly is rather low, it was thought 
desirable to combine these eight measures into 
fewer and more reliable composites. 

In order to decide in what manner these 
eight measures of pupil change should be com- 
bined, it was essential, first of all, to know the 
extent to which the goals of education they 
measure are related to each other. Some in- 
formation concerning this relationship was 
found in the size of the intercorrelations of 
the eight final tests with each other, the eight 
initial tests with each other, and the eight 
change scores with each other. These inter- 
correlations are given in Tables VIII, IX, 
and X. 

Because many of these intercorrelations 
appear to be quite low, it is well to consider 


Data for the additional group were used with the per- 
mission of Mr. Gotham. 

“Throughout this study, standard deviations when ex- 
pressed for classes were obtained by using VN — 1 in the 
denuminators. 
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how large they must be in order to be sig- 
nificantly different from zero. Since correla- 
tions based on random samples of 375 pairs 
of values from an uncorrelated population are 
normally distributed about zero with a stan- 
equals .0517, correlations in these tables 
larger than .10 indicate the presence of a real 
relationship with a considerable degree of 
confidence. 


An inspection of these three tables of inter- 
correlations seems to show that there is a 
greater degree of association between either 
the initial or fina] status of these pupils, rela- 
tive to the goals measured by these tests, than 
there is between the progress the pupils made 
in the direction of the several goals during the 
course of this study. Since the size of inter- 
correlations between two measures depends, 
amongst other things, not only upon the de- 
gree to which the educational functions 
measured overlap, but also upon the reliabil- 
ity of each measure, the intercorrelations be- 
tween the change scores in Table X would 
probably be considerably higher if the change 
measures had been more reliable. 


The intercorrelations recorded in Tables 
VIII, IX, and X show, then, that although 
the measures there listed have something in 
common, there is no great degree of over- 
lapping between any particular combination 
of the eight tests. In other words, these inter- 
correlations by themselves do not reveal any J 
clear-cut manner in which these tests may be 
composited. Consequently, in order to arrive 
at a meaningful way of combining these tests, 
it becomes necessary to supplement these 
intercorrelations with other considerations. 


It will be recalled that the Wrightstone and 
Hill tests were applied at the beginning and 
end of a six-month teaching period, whereas 
the Unit tests were applied at the beginning 
and end of three-week teaching periods. The 
former, then, measure long-time changes; the 
latter, short-time changes. Furthermore, since 
the unit tests, as described in Section II, 
measure more specific outcomes of learning 
than do the other tests, it would appear logical 
to combine these two tests to furnish a single 
measure of teaching ability as manifested by 
specific short-time pupil changes. 

An examination of the three Wrightstone 
and three Hill tests reveals that the Wright- 
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stone tests emphasize the non-informational 
or “intangible” objectives, whereas the Hill 
tests, although labelled differently, seem to 
measure little more than information. Conse- 
quently, the three Hill tests may be combined 
to furnish a single measure of long-time infor- 
mational change. The remaining three Wright- 
stone tests, which emphasize non-informa- 
tional objectives, represent in combination a 
measure of the kind of objectives that modern 
educational thinking has especially assigned 
to the social studies. The three Wrightstone 
tests may then also be combined to furnish a 
measure of long-time, non-informational 
change. 

A crude, but nevertheless, informative pic- 
ture of how these three sets of tests are related 
to each other may be obtained by calculating 
from Tables VIII, [X, and X, the average 
intercorrelations of the tests comprising the 
Unit set, the tests comprising the Wrightstone 
set, and the tests comprising the Hill set 
within each set and between sets. From Table 
XI, which lists these average correlations, we 
find that the average intercorrelation between 
the Unit set and Wrightstone set for final 
scores is .31. This value was obtained by aver- 
aging the six correlations in Table VIII repre- 
senting the intercorrelations of the two Unit 
tests with the three Wrightstone tests. The 
other entries in Table XI were similarly 
obtained. 

Inspection of Table XI shows that the cor- 
relation of the two Unit tests with each other 
is higher than their average correlation with 
either the three Wrightstone or three Hill tests 
on either the final, initial, or change measures. 
Similarly, the average intercorrelation of the 
three Hill tests with each other is higher than 
their average intercorrelations with the two 
Unit or three Wrightstone tests as final, initial, 
or change measures. The average correlation 
of the three Wrightstone tests as measures of 
change with each other is higher than the 
average intercorrelations with the unit and 
Hill tests as measures of change. Although 
this is not true for the Wrightstone tests as 
measures of final and initial status, the aver- 
age intercorrelations of the Wrightstone tests 
with each other in these two cases are essen- 
tially as great as their average intercorrela- 
tions with the Unit or Hill tests. It appears, 
therefore, that the manner of composition sug- 
gested by the considerations stated in the last 
paragraph leads to three sets of tests with the 
tests in each set bearing more resemblance to 


each other than they do to the contents of the 
other sets. It would then appear that these 
three composites are meaningful from the 
point of view of the kinds of objectives 
measured and the manner in which they are 
interrelated.” 

Having settled upon the manner in which 
the tests are to be composited, the question 
arises as to what relative weights shall be 
given to the tests in each set. One could get 
a composite measure for each set of tests by 
simply adding together the raw scores each 
pupil made on the tests in each set. This 
would result, however, in assigning relative 
weights to the several tests in a set in direct 
proportion to their standard deviations. 
Since tests with many items tend to have 
larger standard deviations than do shorter 
tests, this procedure would result in weighting 
tests in proportion to their length regardless 
of other considerations. 

Neither wishing to do the above nor know- 
ing of any reason why one test in a set should 
be given more weight than any other test in 
the same set, it was thought advisable to 
weight the tests within each set equally. Such 
a composite can be obtained by dividing the 
scores a pupil makes on the several tests of a 
given set by the respective standard deviations 
of the distribution of scores for those tests and 
adding the resulting quotients to get a single 
composite score for each pupil. 

In getting a composite of three raw initial 
test scores, one would ordinarily divide X — 
M by the standard deviations of the distri- 
butions of the initia] test scores and for the 
composite of the final raw scores, X — M by 
the -standard deviations of the distributions 
of final raw scores would be used. The stand- 
ard deviations of the eight tests as initial tests 
are not greatly different from their standard 
deviations as final measures.** Since these 
differences are small and since the use of a 
uniform standard deviation for obtaining the 
final, initial, and change standard scores of a 
single test furnishes a convenient check on 
arithmetical calculations, it was decided to 
use a single estimate of variability for each 
test. This single standard deviation was ob- 

® The three composites will henceforth be referred to as 
the Unit, Wrightstone, and Hill i will be 
designated as U, 

i iations. T. L. Kelley, “Si a Method.” 
(New York: The MacMillan Co., 1923), Section 34, pp. 

= Table A, (Appendix A), Original Thesis on file, Library, 
University of Wisconsin. 
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tained by pooling both the sum of squares of 
deviations around the means and degrees of 
freedom for the final and initial distributions 
of scores for each test and calculating from 
these pooled quantities, a single standard 
deviation.” Table XII lists the standard 
deviation of final and initial scores for each 
test as well as the pooled standard deviations 
as outlined above. 


Using these pooled standard deviations, the 
final, initial, and change raw scores for each 
pupil were combined to get final, initial, and 
change composite scores for the Unit, Wright- 


Unit Composite Final Score 

Unit Composite Initial Score. ________. 
Unit Composite Change Score 
Wrightstone Composite Final Score 
Wrightstone Composite Initial Score _- 
Wrightstone Composite Change Score 
Hill Composite Final Score__-- 

Hill Composite Initial Score 


Hill Composite Change Score 


stone, and Hill sets of tests. The following 
example illustrates the manner in which the 
various composite scores were calculated for 
each pupil: 

Let us assume the following table to repre- 
sent the raw scores of a pereegn’ pupil: 

® The following illustrates the method w pooled 
standard deviation was obtained for the Health nits Fest (| (Unit I). 
yi sa Yon 
Total D. F. 


Degrees of 


Freedom <(X — M)? S. D. 


44488 59.48 7.712 
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Final Initial Change 


Wright. Beliefs. _ 
Wright. Gener 
Hill Attitudes ___- 
Hill Inform 

Hill Action 


The following nine composite scores were 
calculated by dividing the raw scores by the 
corresponding pooled standard deviations, as 
given in Table XII, and then summing as 
follows: ’ 


84 














4p 
3. 128 


The three sets of 28 class means of the 
Unit composite change scores, Wrightstone 
composite change scores, and Hill composite 
change scores, represent the average progress 
made by the pupils of the various classes to- 
wards the composite educational goals as 
measured by the tests applied and therefore 
represent three criteria of teaching ability. 


It will be recalled that one of the reasons 
for making these composites was to increase 
the reliability of the criterion of teaching 
ability. The reliabilities of the final and initial 
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TABLE XII 
COMPARISON OF FINAL, INITIAL, AND POOLED STANDARD DEVIATIONS FoR EACH 


Test 

Health (Unit I)_-- 

Community Planning (Unit II) 
Wrightstone Abilities 
Wrightstone Beliefs 
Wrightstone Generalization 
Hill—Attitudes +3 
Hill—Information 
Hill—Action_-__.....--_- 


composites as well as the iuieti composites 
were calculated from the reliabilities and in- 
tercorrelations of the raw measures according 
to a formula given by Kelley. (Table XIII) 


TABLE XIII 
RELIABILITIES OF COMPOSITES OF PUPIL 
Test Scores 
Final Initial Change 
Unit Composites . 82 . 78 .37 
Wrightstone 

Composites . 93 . 93 . 75 
Hill Composites —. 7% .44 

A comparison of these reliabilities with the 
reliabilities of the raw measures, as listed in 
Table IV, reveals that the reliabilities of the 
composite measures are in most cases sub- 
stantially higher than those for the original 
single measures. In light of the above, it was 
thought advisable to consider the possibility 
of further combining these three composite 
criteria into a single composite. It is true that 
each set of tests measures an important area 
of the total outcome of the educational goals 
outlined in this study but for purposes of this 
investigation it appeared meaningful to obtain 
also a single measure of the extent to which 
the pupils approached these goals. 

One could arrive at such a single composite 
by assigning equal weights to each of the 
three composites already obtained. Since the 
reliabilities of two of these composites are 
rather low, it was decided, however, to weight 
each composite in such a manner that the 
reliability of the resultant single composite be 
a maximum. 

Since, as will be subsequently seen, it is 
necessary to make allowances in the mean 
pupil gains for the variability of the mean 
pretest scores of the various classes,® it is 


“Truman L. Kelley, “Statistical Methods,” (New York: 
Macmillan Co., 1923), D. 197, (Formula 147). 

@ At a later me in the development of the criterion of 
teaching ability necessary to equate pupil groups 
on those wl mek condition the extent to 7 ich they 

Since one of the factors is initial status, it is desirable 
that the measure of initial status be as reliable as possible. 


Initial 
Test 
8.D. 
7.05 
8.24 

20. 28 

13.44 

13. 27 
2. 66 
3.02 
2. 88 


desirable to maximize not only the reliability 
of the UWH™ composite gain, but also the 
UWH composite of initial status. 


The probiem of making a single composite 
of the three composites already arrived at then 
resolves itself into discovering that set of 
weights, W,, W., and W,, for the U, W, and H 
composites respectively, which will simultane- 
ously maximize the reliabilities of the WH 
composite as both a measure of initial status 
and of change. 

To solve this problem, separate sets of 
optimum weights were first obtained for the 
UWH composite of initial status and UWH 
composite gain as follows:"* 

If we let X,, X,, X,, X,, X,, X,, X;, and 
X, represent the raw socres on the Health, 
Community Planning, Ability to Organize 
Research Materials, Scale of Civic Beliefs, 
Generalizations, Hill Civic Attitudes, Hill 
Civic Information, and Hill Civic Action tests 
respectively, and S.D.,, S.D.,, S.D.,, S.D.., 
S.D.,, S.D.,, §.D.,, and S.D.,, the correspond- 
ing standard deviations, the weighted sum 
whose reliability we wish to maximize be- 
comes: 


Final 
Test 
S. D. 


8.30 
8. 57 


x x x 
S, =v, <=> w,—+-+4,—>+4 
L=" Fp. + “5p.+ *5p,.*™ 
x, 5 P xX, 
3D. + 5D.+™3D.+ “35D, 
xX, 
+ “sD. 








where w, — weight of tests in Unit composite 

w, == weight of tests in Wrightstone 
composite 

w, == weight of tests in Hill composite 


7 The UWH composite denotes the single measure into which 
U, W, and H composites will be combined. 

%a The solution to follow which is an adaptation of Kelley's 
general formulae for the correlation between weighted aver- 
ages, was suggested by Hellfritzsch. 
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Letting the raw scores that might be ob- 
tained upon a second administration of this 
battery of tests be denoted by primes, the 
weighted sum of scores of a second application 
would become: 

es Xx’, XxX’, 
S=Msp t+ 5p+™ 5D, 
xX’, x’, Xx’, 
+! 3p, t ™ SD, t™ 3D. 
Pf aa 
+ “Spr + Spr 








The problem of maximizing the reliability of 
this weighted sum is identical with maximiz- 
ing the correlation between S, and S,, i.e., 


pee 2sise 
me CNS Da, «S$ Da) 


Kelley” has given a formula for calculating 
Ys,s2 if the intercorrelations of the type x:x), 
rx,x,, and rx-,x, (where i + j) as well as the 
weights are known. In our case, rx,x,,’s are 
the reliability coefficients and the rx,x, 5, = 
rx’,x, and are the correlations of the eight 
tests with each other and have already been 
listed in Tables [IX and X. 


Since we are interested only in the relative 
weights, we can let w, = 1, so that only w, 
and w, need to be determined. To get the 
values of w, and w,, which will make 73,5, a 
maximum, we need therefore to obtain the 
simultaneous solution of two equations in 
w, and w, which result when we set the par- 
tial derivatives of rs,s., with respect to w, and 
w, each equal to zero, that is, 


O 7s1s82 


Ow, 


O 7s182 


Ow, 
7" Truman Kelley, 7" Method,” (New York: Mac- 
millan Co., 1923), p. 198, ot - 148). 





= f(w,, w,) =o 


== F(w,, w,) =o 


Unit Wrightstone 
Composite = 
Wi 2 


Composite 
W 


M% 6 1 


2 
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The two equations /f(w,,w,) =o and 
F(w,, w,) == 0 turn out to be cubic equations 
and were graphically solved. 

The relative weights of w,, w,, and w,, 
which give the greatest reliability for the 
UWH composite of initial status turn out 
to be: 

W, = 2 

W.=7 

W, =I 
to the nearest integer. These weights, when 
substituted in Kelley’s formula, would give a 
UWH composite of initial status with a reli- 
ability of .94. 

The corresponding weights for maximizing 
the reliability of the UWH composite change 


are: 
w,=—% 
w, = 6 
w,=—I 


to the nearest one-half. Using these weights, 
the reliability of the UWH composite as a 
measure of change would be .76. 

Since it was desired to use the same set of 
weights for both the initial and change com- 
posites, that set of weights was found (by 
trial and error) by exhausting all of the com- 
binations of integral weights such as (2, 8, 1), 
(1, 5, 1), etc., in the neighborhood of the 
above weights which resulted in initial and 
change reliabilities whose sum of deviations 
from the above reliabilities was a minimum. 
This set of weights turned out to be: 

w,=—i 
w,=7 
w,—I 


and is the set which was used in combining 
the Unit, Wrightstone, and Hill composites 
into a single measure of both initial status 
and change. 

The values of the initial status and change 
reliabilities corresponding to each of the above 
three sets of weights are as follows: 


Reliabilities of U W H Composites 


as measure of Comments 
Initial Cc 
Status 

. 936 This set of weights 
maximizes change 
reliabilities. 
This set of — 
maximizes initial 
Status reliabilities. 
This set of weights 
was used. 


- 943 


. 941 
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From the above it is seen that although the 
set of weights actually used does not furnish 
the mathematically maximal reliabilities in 
either the case of initial status or change, it 
leads to reliabilities that are practically 
identical with the maximal values if all are 
rounded to two decimal places. 

The mean changes for each class as repre- 
sented by the U composite, W composite, 
H composite, and UWH composite, which 
have thus far been discussed, have all been 
built up from the differences between the raw 
final and initial status scores the pupils made 
on the eight tests (two Unit tests, three 
Wrightstone tests, and three Hill tests). 

Below are listed the correlation coefficients 
which were calculated, for the total group of 
375 pupils, between initial status (#) and 
change (c) as measured by the single tests 
and the U, W, and H composites:™* 7;. for 
U, = —.18; U, = —.44; W, —.45; 
W, = —.43; W, = —.54; A, —.§1; 
H, = —.44; H, = —.49; U. —.26; 
W, = —.46; and H, = —.56. 

These coefficients are all negative and indi- 
cate the tendency for pupils with high initial 
status scores to gain less than pupils with low 
initial status scores. This tendency may be 
due to the fact that a test of finite length 
offers less opportunity for a pupil with a high 
initial test score to gain than it does for one 
with a low initial test score, or to the practice 
of many teachers of concentrating their teach- 
ing efforts upon the children in the class who 
are at the lower achievement levels, or to 
some other factors. Be that as it may, it re- 


aries 
l-— Seore 








rn " rn 4 4. 4 
ua “ us a6 7 as iw a a a as 
Regression Curve of Average Sain @ Pre-test Sore for 

Sart Seupeotte (Re) 


"a Editor's note: The reader may be interested in examining 
the data on this point in subsequent studies of this report. 


Pre test Seore 


mains that a class of pupils having a low 
initial test average would have a better chance 
of making larger gains than one having a 
higher initial test average. An index of teach- 
ing ability based upon average pupil change 
would then tend to favor the teacher whose 
class began at a lower level of achievement. 

Since for the purpose of this study it is 
necessary to derive an index of teaching abil- 
ity which is free from such tendencies, it was 
thought desirable to adjust the raw gains 
made by each pupil in such a manner that 
the pupil’s chances of gaining a given amount 
as measured by the adjusted gains would be 
independent of the size of the particular initial 
status scores.”*” 


PLATE 2 
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Regression Curve of Average Guix on Pre test Score for 01) Compesite (tee) 
™> The notation is as follows: 
U, Health Test (Unit I) 
U, Comm. Planning (Unit IT) 
W, Wright. Abilities 
W, Wright. C. Beliefs 
W, Wright. Generalizations 
H, Hill Attitudes 
H, Hill Information 
H, Hill Action 
U, Unit composite 
W ., Wright. composite 
H, Hill composite 
r,. Correlation between initial status and change 
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Plates I, II, and III show the regression 
curves of average change on initial test scores 
for the U, W, and H composite measures. 
These curves were drawn in such a manner 
that they gave a smooth curve fitted to the 
indicated points, as well as a free-hand curve 
might be. The indicated pcints represent the 
average gains made by from 35 to 95 pupils 
whose initial test scores lay within the par- 
ticular initial test score intervals correspond- 
ing to the points. These curves reflect the 
same tendency of pupils with high initial test 
scores to gain less and vice versa, which was 
revealed by the negative coefficients listed on 
the preceding page. 

An adjustment of individual] pupil change 
was then based upon the assumption that the 
average raw changes made by groups of pupils 
having various average initial test scores are 
educationally equivalent. The adjustment 
made of each pupil’s U, W, or H composite 
changes was to divide the pupil’s observed raw 
change by the average raw change (read from 
the curve) made by the pupils having the 
same initial test score. Thus all pupils whose 
combination of actual change and initial test 
score, which when plotted upon the graph, fell 


exactly on the curve of regression would have 
an adjusted change score equal to unity. 
Those points falling below the curve would 
correspond to a value something less than one, 
and those above, something greater than one. 
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Tables XIV, a, b, and c, list the divisors 
(taken graphically from the curve) for the 
various initial test intervals which were used 
in adjusting the raw changes as measured by 
the U, W, and H composite measures. 


TABLE XIVa 


TABLE FOR CONVERTING UNIT COMPOSITE RAW 
CHANGE Scores INTO UNIT COMPOSITE 
ADJUSTED CHANGE SCORES 


If Initial Raw Divide Raw 

Composite Score Composite 

Falls Between Change By 
9. 00—11.29__- 

11. 30—12. 

12. 80—14. 

14. 03—14. 

15. 00—15. 

15. 90—16. 

16. 58—17. 

17. 20—17. 

17. 80—18. 

18. 30—18. 

18. 79—19. 

19. 24—19. 


Pmt ttt fet pe fh ft pet DED 


The resultant three U, W, and H adjusted 
composite changes were combined in a 
1 — 7 — 1 ratio, as were the U, W, and H 
raw composites, to give a UWH adjusted com- 
posite change (UWH,.-.) for each pupil. The 
mean adjusted composite change for each class 
and for the combination of all classes for the 


TABLE XIVb 


TABLE FOR CONVERTING WRIGHTSTONE COMPOSITE RAW CHANGE Scores INTO WRIGHTSTONE 
COMPOSITE ADJUSTED CHANGE SCORES 


<> aap ga 
En oe 


11. 17—11. 33 


BO PO RD ND NO PO PO ND Go Go 0 G0 G0 G9 G0 G0 G0 C9 wm 


Divide Raw 
Composite 
Change By 


If Initial Raw 

Composite Score 

Falls Between 

12. 11—12. 31___._- 

12. 32—12. 52 

12. 583—12. 77___-_- 

12. 78—13. 00 

13. 01—13. 27 pips 
13. 28—13. 61..........-- 
13. 52—13. 81 

13. 82—14. 09- See le 
. SS See 
14, 40—14. 72__.._.___- 

14. 73—15. 07 

15. 08—15. 48___- 


lanl eed eka ed) 


ec eee 
 & - aaREDRgeRtemae: 
SN Os coca cenceckdeuc 
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TABLE XIVc 


TABLE FOR CONVERTING HILL CoMPOsITE RAW CHANGE Scores INTO HitL COMPOSITE 
ADJUSTED CHANGE SCORES 


If Initial Raw If Initial Raw 
Composite Score Composite Score 
Falls Between Falls Between 
10. 04—10. 
0. 29—10. 

. 50—10. 

. 746—11. 

.02—11. 

. 29—11. 

. 56—11. 

. 88—12. 

.19—12. 

. 52—12. 

. 89—13. 

. 28—138. 

. 70—14. 

.19—14. 

. T2—15. 

. 34—16. 

. 138—17. ‘ 

. 87—19. 39_____-- 


© SO SO WO OO G0 GO MINIM I SH MH ¢ 


9. 830—10. 03. 


P+ PO RO ROTO NO ND NO OPO ND Go Go G0 Go Go Go 


TABLE XV 
MEANS FoR CLASSES ON U, W, H, AND UWH ComPosiITEs OF ADJUSTED CHANGE SCORES 


Unit Wrightstone Hill UWHj 
Adjusted Adjusted Adjusted Adjusted 
Change Change Change Change 


. 
2. 
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U, W, H, and UWH composites are listed in 
Table XV. 

Thus, in the course of this section, eight 
numerically different measures of average 
pupil progress have been established for each 
class. They are as follows: 


1. Average U Composite Raw Change 
(Urc.) 

. Average W Capes Raw Change 
(Wr) 

. Average H Composite Raw Change 
(Axc.) 

. Average UWH Composite Raw Change 
(UWHxc.) 

. Average U Composite Adjusted Change 
(Vac.) 

, Average W Composite Adjusted Change 
(Wac.) 

d Average H Composite Adjusted Change 
(Hac.) 

. Average UWH Composite Adjusted 
Change (UWH,-.) 


Anyone of these eight measures should rep- 
resent a valid measure of teaching ability 
which might be used to rank the abilities of 
the several teachers studied, if the teaching 
situation (pupil capacities, environmental 
factors, etc.) in the 28 classrooms was every- 
where alike except for the one factor, teaching 
ability, which is under investigation. 

In addition to the scores on the initial and 
final administration of the eight Unit, Wright- 
stone, and Hill tests, it will be recalled from 
Section II that measures were obtained of 
pupil C.A., M.A., L.Q., Reading, and socio- 
economic status. Table XVI lists the mean 
and standard deviation for each of these 
factors for each class and for the total pupil 
population. An inspection of these tables re- 
veals that the means of these several pupil 
factors vary considerably from one class to 
another. 

Since the 28 pupil groups should be homo- 
geneous with respect to the factors M.A., 
I.Q., reading, socio-economic status, U com- 
posite initial test score, W composite initial 
test score, H composite initial test score, and 
UWH composite initial test score, if one is to 
base an evaluation of the several teachers’ 
abilities upon the progress of these groups, 
tests of homogeneity were made. 

In one test, the variance between the means 
of the 28 classes was contrasted with the vari- 
ance within the 28 classes. The ratio of these 


two variances is the statistic F’* which enables 
one to determine whether or not the variation 
between class means is significantly greater 
than the variance within classes. If the value 
is larger than the 5% or 1% values, then the 
hypothesis that the 28 classes are homogene- 
ous is untenable. 

The other test was a chi-square test” to 
determine whether or not the variances of the 
pupil factors in the 28 classes were homo- 
geneous. It is conceivable that the 28 classes 
might have discrepant means but have homo- 
geneous variances. 

Table XVII lists the values of F and chi- 
square resulting from these tests as well as 
the corresponding 5% and 1% values which 
are necessary to determine whether the cal- 
culated values refute the hypothesis of homo- 
geneity or not. 


TABLE XVII 


F AND CHI-SQUARE VALUES USED TO DETER- 
MINE HOMOGENEITY OF PUPIL FACTORS 


(28 CLASSES) 


Pupil Factor F Chi-square 


47.03 
38.98 
34. 69 
Seal Manasiala Status 43.61 
U Composite—initial 
W Composite—initial________ 
H oar ite—initial 
UWH Composite—initial__ 


5% Value 
1% Value 





The values of F are uniformly much higher 
than the 1% value thus indicating that the 
means of the 28 classes, relative to these 
factors, are too discrepant to be adjudged 
homogeneous. The values of chi-square, how- 
ever, indicate that the variances of five of the 
eight factors are consistent with the hypothe- 
sis of homogeneous variances; the three 
factors whose chi-square values exceed the 
5% values are M.A., socio-economic status, 
and W composite-initial scores. 

In the original conception of this experi- 
ment, this variation of pupil factors was 
anticipated and it was planned to arrive at 
homogeneous sets of pupils by eliminating the 
least number of pupils necessary to arrive at 

™ The test of the significance is de- 

scribed by Snedecor in Section 10.4. G. W. Snedecor, Statistical 
Methods; (Ames, Iowa: Collegiate Press, ” 1938), pp. 182 ff. 


7% A test for a homogeneity of ae | estimated variances 

is given by M. Bartlett, “ ies of Sufficiency and Sta- 

tistical Tests,” Runes of the Royal Society of London, 
Series A, Vol. 160 (1937), pp. 273 ff. 
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28 classes might be all equated with each 
other on a pupil-to-pupil basis or at least on 
the basis of comparable means and standard 
deviations. After a lengthy examination of 
the pupil data, however, it appeared that the 
means of the various classes differed so widely 
that neither of these processes could be carried 
out without eliminating most of the 375 pupils 
involved. 

It seemed best, therefore, to control the 
pupil factors statistically. The eight measures 
of average pupil progress listed on page -- 
were each considered to be a function of three 
things: (1) the ability of the teacher; (2) the 
capacity, social background, maturity, and 
initial achievement levels of the pupils; and 
(3) other factors not measured. It was 
assumed that the 28 classes were comparable 
as far as the “other factors not measured” 
were concerned. If this is true then the meas- 
ures of average pupil progress are functions 
of teacher ability and pupil factors plus a 
constant factor. 

The criterion of teaching ability being 
sought is that portion of the average pupil 
progress which is ascribable to the teachers’ 
influence. For the purpose of measuring the 
teaching ability of one teacher relative to an- 
other, it makes no difference if all of the 
teacher measures are too large or small by a 
constant, the constant being the effect of the 
“other factors not measured”. The teacher 
effect plus this constant can be obtained from 
the average pupil change by subtracting from 
the latter that portion of the pupil change 
that can be mathematically ascribed by mul- 
tiple regression to the pupil factors. 

In order to control validly the variability 
of the pupil factors, (class means of M.A., 
1.Q., reading, socio-economic status, and ini- 
tial test measures) it is, however, necessary 
that the variances of the classes used in cal- 
culating the multiple regression equations be 
homogeneous. It will be recalled from Table 
XVII that the 28 classes can be considered 
as having homogeneous variances with respect 
to all the pupil factors except, M.A., socio- 
economic status, and W composite-initial. To 
arrive at a set of classes for which the vari- 
ances were homogeneous for all of these 
factors, it was found necessary to delete four 
classes. These classes were numbered, 4, I1, 
14, and 20."* All subsequent calculations will 


™ By trial and error in eliminating those classes having ex- 

treme variances, it was found that this was the least number 
of classes which needed to be deleted in order that the re- 
maining classes might represent a group whose variances were 
homogeneous on the several pupil factors. 
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be based, then, upon the remaining 24 classes 
which have a total of 342 pupils. 

Table XVIII lists the values of F and chi- 
square for testing the homogeneity of the 
means and standard deviations of the remain- 
ing 24 classes together with the 5% and 1% 
values which are necessarily based upon some- 
what different degrees of freedom than for the 
case of 28 classes. Thus by eliminating 4 
classes, the values of chi-square for each of 
these factors, except for socio-economic status 
which was not included in the original test 
but added later, is less than the 5% value and 
hence consistent, except as noted, with the 
hypothesis of homogeneous variance. 

In order to obtain a multiple regression 
equation of these factors on average pupil 
change, for the purpose of estimating that 
portion of pupil change attributable to these 
pupil factors, it was first necessary to calcu- 


TABLE XVIII 
F AND CHI-SQUARE VALUES USED TO DETER- 
MINE HOMOGENEITY OF PUPIL FACTORS 
(24 CLASSES) 
Pupil Factors 
SNE AT Pare ee 


F Chi-square 


Socio-Economic Status -__- 

U Composite—initial 

W composite—initial__.._____- 
H composite—initial 

UWH composite—initial 


. ae ; 
| 4 Eee 





late the intercorrelations with average change. 
Table XIX lists all of the correlations which 
were necessary for each factor together with 
the means and standard deviations of the 24 
class means. These correlation coefficients 
were each calculated by correlating the appro- 
priate set of 24 pairs of class means.” 

The next step in correcting the average 
composite changes as measured by one of the 
eight composites for variability in the meas- 
ured pupil factors is to obtain the prediction 
equation which enables one to calculate the 
average change one would expect a class with 
known capacities to make. To do this, eight 
different equations were obtained, one for 
each of the eight measures of pupil progress. 

"The correlation between means is equal to the correla- 
tion between measurements.” J. P. Guilford, Psycho- 
metric Methods, (New York: McGraw-Hill, 1936), p. 373. 
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(Mn’s AND S.D.’s For ALL Factors INCLUDED) 
(N = 24) 
I. Q. Reading Sims Composite—Pre-test Measures Composite—Raw Change 


CORRELATIONS BETWEEN PupiL Factors (M.A., 1.Q., READING, Socio-Economic STATUS) AND COMPOSITE MEASURES OF CHANGE 


H UWH 
—.17 


—.28 
—.40 
—.09 


Ww 
. 06 
—.02 


—.1k 
—.01 


Composite—Adjusted Change 


M.A. 


SSHxs | 


U—Composite Pre-Test 


' =o 
. =O 


—o 


TU Sega sm 


11.71 129.74 
9.60 


1.00 . 
“14.19 15.28 14.68 
7.02 6.34 9.98 2.62 .79 1.18 1.11 


163.59 100.17 0.53 


posite Pre-Test_ - 


—_— Pre-Test_ 
om 


om 
a... 


W—Composite Pre-Test. 


H—C 
UWwH— 


To illustrate the procedure, one may con- 
sider the case of average change as 
by the W composite raw change. The pupil 
factors which are to be statistically controlled, 
in this case, are M.A., L.Q., reading, socio- 
economic status and W composite initial 
status."® The correlations which are involved 
in obtaining this particular prediction equa- 
tion are listed in Table XX. 

The beta coefficients which are necessary to 
express the multiple regression equation of 
W composite raw change on these five pupil 
factors were obtained by the Doolittle 
method."* The values of the beta coefficients 
turn out to be: 


B 01.2345== = .335 
B 02.1345 == —.087 
B 03.1245 == —.318 
B 04.1235 == —.072 
B 05.1234 == —.374 


The multiple correlation coefficient between 
these five pupil factors and W composite raw 
change was .52. 


Using the standard deviations listed in 
Table XIX to convert these beta coefficients 
into 5 coefficients and calculating the values 
of the constant, the prediction equation be- 
tween these variables can be written: 


X, = —.346X, — .035X, 
+ .052X, — .031X, — .018X, + 2.87 


where X, — predicted average W composite 

raw change 

X, = W composite initial score—Class 
Mean 

X, = Reading score—Class Mean 

X, = M.A. score—Class Mean 

X, = Socio-Economic score — Class 
Mean 

X, = LQ. score—Class Mean 


7 The measure of initial status which was controlled in the 
case of each measure of average change was the initial status 
as measured by the same composite as the particular change, 
i.e., when adjusting Up, or U, « the measure of initial 
status which was controlled was initial status as measured by 
the U composite; ang A A NF the W com- 
posite of initial status was controlled; when penn By — H 
aa the H composite of initial status was con 

* The operations involved in the Doolittle method -— oa 

explained in any of following references: C. C. P 

and E. C. Wykes, y ~~ Methods of uting Regres- 
Coefficients artial and Multiple tions.” Jour- 
—t., Pm Se XXIII (1931), pp. 383-393; 

G. W. Snedecor, “Correlation and 


ma: 1931). Dp. 35-45; and J. P. 
Guilford, ethods, (New York: McGraw-Hill, 
1936), pp. 303-397. 
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TABLE XX 


CORRELATIONS EMPLOYED IN THE PREDICTION OF W COMPOSITE RAW CHANGES 
FroM Pupit FActTors 


283 


: g 

. Socio-Economic Status 
. W composite—initial 

. W composite—R. C.__- 


@eBSeRaec. 


2&9 oe fc. 


BETA COEFFICIENTS AND MULTIPLE CORRELATION COEFFICIENTS FOR FIVE PUPIL 
FACTORS AND EIGHT CRITERIA 


Beta Coefficient for Factor 
2 3 4 5 Reji2345 


st 


Dependent l 
Variable Composite Socio- 
Initial Reading M.A. Economic I.Q. 
Test Status 
U composite R.C.... —.161 —. 024 —. 367 —. 168 . 349 
WcompositeR.C... —.374 —.318 , -885 —. 072 —. 087 
HcompositeR.C.... —.522 —. 326 . 198 —. 086 . 123 


—. 353 . 279 —. 107 —. 028 

‘ —. 020 —. 361 —. 165 . 878 

W composite A.C. - —. —. 208 . 295 .014 —.014 

H composite A. C._- —. —. 374 . 421 . 071 —.120 
UWH composite 

« » Jaa —. —. 338 .314 —.012 —. 035 


*R. C. =Raw Score change; A. C. =Adjusted score change. 


TABLE XXII 


MULTIPLE REGRESSION EQUATIONS FOR PREDICTING AVERAGE PUPIL RAW AND ADJUSTED 
CHANGES ATTRIBUTABLE TO PUPIL FACTORS 


Xey =—.173X, — .002X2— .044X3 — .057X4 + .055X5 + 7.02 
Xeg =—. 346X 1 — .0385X2 + .052X3 — .031X4— .018X5 + 2.87 
Xes =—.743X1,— .052X_ + .045X3 — .054X4 + .036X5 + 4.10 
Xeg =—.321X, — .315Xe2 + .354X3 — .378X4— .047X5 + 30.99 
Xes =—.045X1 — .001X 2 — .024X3 — .030X4 + .033X5 + 2.80 
Xeg =—.117X1 — .016X2 + .032X3 + .004X4— .002X5— 1.21 
Xe7 =—.871X1 — .074X2 + .119X3 + .056X4— .045X5 + 1.33 
Xeg =—.085X 1 — .224X2 + .296X3 — .032X4— .0438X5— 6.51 


In which 
Mean Pre Composite 
Mean Reading Score 
Mean M. A. 
Mean Socio-Economic Status Score 
Mean I. Q. 
Predicted Unit Raw Change 
Predicted Wrightstone Raw Change 
Predicted Hill Raw Change 
Predicted UWH Raw Change 
Predicted Unit Adjusted Change 
Predicted Wrightstone i Change 
Predicted Hill Adjusted Change 
Predicted UWH Adjusted Change 


ee ue ns ad 0d bd bd ed hd et Ot ot Ot et ee 
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Having found this equation, the predicted W 
composite raw change average for each of the 
24 classes can be predicted. 

By an exactly similar process, the predic- 
tion equations for the other seven measures 
of pupil progress were obtained. Table XXI 
lists the beta coefficients and multiple correla- 
tion coefficients corresponding to each solution 
and Table XXII lists all of the resultant pre- 
diction equations. 

By means of these eight prediction equa- 
tions, a predicted average pupil change was 
calculated for each of the eight measures of 
pupil change for each class. Table XXIII lists 
all of these predicted pupil changes. 

In Table XXIV are listed the means of the 
raw and adjusted pupil changes for the U 


composite, W composite, H composite and 
UW4 composite. 

If we let g, equal the observed pupil 
changes, which were listed in Table XXIV 
and g, the pupil changes which are described 
above in Table XXIII, we obtain that por- 
tion of g, which measures the relative teaching 
abilities of the several teachers, g,., by sub- 
tracting g, from g,, that is, 


§: = 8 — 8» 


Table XXYV lists these differences for each 
teacher for each of the eight measures of pupil 
change. These eight measures will hereafter 
be referred to as the eight criteria of teaching 
ability. 


TABLE XXIII 


PREDICTED AVERAGE PuPriIL CHANGE ON INDICATED COMPOSITES ATTRIBUTABLE TO PUPIL FACTORS 
or M.A., L.Q., READING, AND Socio-Economic STATUS 


Predicted Predicted Predicted Predicted Predicted Predicted Predicted Predicted 


Hill 


Se et et ee DD Oe Re 
me me | me tom Om oe | 


— 


‘ 1. 
i, 2. 
5 ) 3. 


UWH 
R.C 


Unit Wright- Hill 
stone 

A.C. A.C. A.C. 
1.18 ; .13 

.92 

.98 

. 06 

. 02 

.01 

.90 

. 84 

. 08 

. 93 

. 92 

.27 

. 85 

.09 

. 96 


— ee 


HH PARBRAIAARAHOHHS OHH OH SHANI AMD 
SBeBFkESeEBRRa 


*These class numbers are those originally assigned; four classes were omitted to establish homo- 


genity. 
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TaBLeE XXIV 

MEAN OF OBSERVED PupiL CHANGE SCORES FOR INDICATED COMPOSITES 
(Raw Score) 


U Ww H UWH U Ww H UWH 


Composite Composite Composite Composite Composite Composite Composite Composite 
R.C. R. C. R.C. R.C. A.C. A.C. A. C. A.C. 


XXVIII--- 


1.24 
3.26 
5.14 
-81 
4.20 
. 84 
. 53 
-61 


10.12 
11. 64 
24. 57 


Fol Re 
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CRITERIA oF TEACHING ABILITY: PoRTION OF AVERAGE PUPIL CHANGE ATTRIBUTABLE TO TEACHER 
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C,=g (derived from U composite 
R.C.) 

C,=g. (derived from W composite 
R.C.) 

C,=& (derived from H composite 
R.C.) 

C,= 8: (derived from UWH composite 
R.C.) 

C,=& (derived from U composite 
AL.) 

C,=& (derived from W. composite 
AL.) 

C,=g (derived from H composite 
AC.) 

C,=g8, (derived from UWH composite 
AL.) 


Each of these criteria is a measure of that 
portion of the average pupil gain which is 
attributable to the effect of the teacher plus 
a quantity assumed to be constant for all the 
teachers under investigation. The 24 indices 
listed in any one of the eight columns repre- 
sent a set of comparable measures of teaching 
ability based upon pupil groups which were 
rendered comparable by statistically control- 
ling the effect upon pupil change of the vari- 
able pupil factors: M.A., 1.Q., reading, socio- 
economic status, and initial status. 

It is interesting to note that, for the 24 
teachers who make up the sampling from 
which Table XXV was obtained, six teachers 
had negative scores in all the criteria of teach- 
ing ability whereas four teachers had positive 
scores and were above average in all of the 
criteria. For the remaining teachers, no such 
clear trends are indicated. Most of the teach- 
ers appear to vary in their teaching abilities 
and show tendencies to be above average in 
some of the criteria and below average in 
other criteria. 


The distinctive value of the criteria of 
teaching ability presented here lies in the 
fact that ability is ascertained not by sub- 
jective supervisory ratings which permit of 
halo effects, shifting standards of evaluation, 
and influences extraneous to the teaching 
process itself, but by objective instruments 
of measurement impartially applied to the 
pupils in actual class-room situations. 


SECTION IV 


STATISTICAL VALIDITY OF SELECTED 
TEACHER MEASURES 


The purpose of the previous section was 
to establish criteria of teaching ability arrived 
at objectively and upon the assumption that 
changes produced in pupils are the most de- 
sirable criteria of teaching ability. The next 
step is to determine the statistical validity 
of the various teacher measures by intercor- 
relating all of the teacher measures with each 
of the eight criteria of teaching ability. 


It will be recalled from Section III that 
scores on the following measures were ob- 
tained for each of the 28 teachers. The 
measures used and the notation to be applied 
to them is as follows:** 


T, Wrightstone—Abilities to Organize Re- 

search Materials 

American Council Psychological Exam- 

ination 

Social Attitudes of Secondary School 

Teachers 

Yeager—Scale of Attitude Towards 

Teachers 

Torgerson—Mental Hygiene 

Teachers College Psychological Examina- 

tion 

Test on Community Planning (Unit IT) 

Health Test (Unit I) 

American Council Civics and Government 

Test 

Torgerson Diagnostic Teacher Rating 

Scale (Investigator) 

Bernreuter—Personality Inventory—Bn 

Orientation Test 

Bernreuter—Personality Inventory—Fc 

Almy-Sorenson Rating Scale for Teach- 

ers (Investigator) 

Bernreuter—Personality Inventory—Bd 

Michigan Rating Scale (Investigator) 

Morris Trait Index L 

Bernreuter—Personality Inventory—Bs 

Michigan Rating Scale (Supervisor) 

Bernreuter—Personality Inventory—Fs 

Washburne — Social Adjustment Inven- 

tory 

T.. Torgerson—Teacher Problems 

T,, Stanford Educational Aptitudes Test T-A 

T,, Stanford Educational Aptitudes Test A-R 
"a For a fuller description of these see pp. 15-20. 
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T,, Almy-Sorenson Rating Scale for Teach- 
ers (Supervisor) 

T., Stanford Educational Aptitudes Test T—-R 

T.,, Torgerson Rating Scale (Supervisor) 


In order that the arithmetical work in cal- 
culating this large number of correlation co- 
efficients might be facilitated, each raw score 
for each teacher as well as each criterion score 
for the group of 24 teachers was converted 
into a standard score by subtracting from it 
the proper mean and then dividing the differ- 
ence by the proper standard deviation of the 
distribution of scores for the 24 teachers, i.e.. 

X,—M X, — M. 
———— > 2 — etc. The 
standard scores thus obtained for the criteria 
of teaching ability are presented in Table 
XXVI. 

The coefficient of correlation between any 
two variables could then be obtained by use 
of the formula 


z,=> 
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Table XXVII lists all of the resultant cor- 
relations of, (1) the criteria with each other, 
(2) the teacher measures with the criteria, 
and (3) the teacher measures with each other, 
Interpretation of Table XXVII will be made 
in this order. 


(1) Interpretation of intercorrelations of 
criteria with each other.—Consideration of 
the intercorrelations of the criteria with each 
other raises the pertinent problems of deter- 
mining whether any significant relationships 
exist between raw and adjusted pupil changes 
for each of the four types of composites used, 
U, W, H, and UWH, and whether any sig- 
nificant relationships exist between the raw 
and adjusted pupil changes for the several 
combinations of composites. 

The correlations between raw and adjusted 
changes for the four types of composites may 
be presented as follows: 


Composite 


T(R.C.) (A.C.) 


TABLE XXVI 


CRITERIA OF TEACHING ABILITY—PORTION OF AVERAGE PUPIL CHANGE ATTRIBUTABLE 
TO TEACHER EFFECT AND CONSTANT FACTORS 


(Standard Scores) 


Cl Cc sail Cc ad Cc Z Cc Cc Z Cc bd Cc : te C btn 
ass omposite Composite Composite Composite Composite Composite Composi om posite 
Ss — aC Le ae”: hha” ha 
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TABLE XXVII 


INTERCORRELATIONS AMONG THE TEACHER MEASURES AND THE EIGHT CRITERIA (N = 24) 
Ti2 


Ti 


Wrightstone -- - 
— Coun. 


ad 


Teachers sychological 
Commnnity vanes 

is deen thicssin a 

Amer. Gov. & Civics - 

Torgerson Teacher 

Rating-I _- 

Bernreuter-Bn 

Ti2 Orientation 

Bernreuter-Fce ee 
Almy-Sorenson-I ______. -__-_-- 


Basssasasas AH 


ce enone w& 


a . , oo ; 44 .01 


Ts Ts Tr Tio Tu 


. 64 —. 22 

-46 —.30 

.02 .07 

.30 —.13 

.44 —.03 

.04 —. 10 
—.05 —.18 —. - 08 


cara, cons sa ngentaeliphcamclereac Deed 


Morris Trait Index - 
Bernreuter- -Bs . 


Teachers Problems -_--_------_- 
Stanford T-A 
24 Stanford A-R _-_-_- 
5s Almy-Sorenson-S 
Stanford T-R 
Torgerson Rating-S -_- 


From the above it is seen that the corre- 
lations between raw change and adjusted 
change scores for all of the composites, ex- 
cept for the H composite, are high. One can 
safely conclude from this that the use of raw 
change scores is as desirable as the use of 
adjusted change scores, providing that the 
variability of the other pupil factors is con- 
trolled. The fact that making adjustments to 
raw scores is very time consuming should 
speak favorably for the use of raw change 
scores for which pupil factors have been 
Statistically controlled. 

In order to ascertain whether adjusted 
changes are differently interrelated than raw 
changes are interrelated, the significance of 
the difference between the obtained corre- 
lations should be calculated. Fisher’s z test,"® 
used to measure the significance of differences 
between correlations was applied to all pairs 

™R. A. Fisher, op. cit., Section 35. 


of correlations with the result that no differ- 
ence between correlations was found to be sig- 
nificant.” This again is evidence of the fact 
that one may use either raw or adjusted 
change scores, but because of the relative ease 
with which the former can be obtained and 
used, it should be the most desirable of the 
two methods for employing pupil change 
scores. 

(2) Interpretation of intercorrelations be- 
tween teacher measures and the criteria—To 
facilitate interpreting the intercorrelations be- 
tween the various teacher measures and the 
criteria, it would be expedient to consider the 
teacher measures as falling under the follow- 
ing catagories: 


™® The largest difference of the values of s, for the correla- 
tions of .980 and .959, .3666 +. not exceed twice the stand- 


inal thesis on file University of Wisconsin Library for = 
correlations. 
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TABLE XX VII—Continued 


Tis Tis Tis Tis 

Wrightstone_.. .07 .34 .05 .25 
Amer. Coun. 

—.20 . a. va 


| ee 

Socal Attitudes —.21 . ._ oe 
Yeager .02 —. -15 .03 
Mental Hyg.... —.21  . , 2" , 
Teachers 

Psychological.. —.04> . . a 
Com. Planning. —.06 . -18 —. 08 
Health —_  % .038 .01 


—. 06 


er Rating, I._._.. —.33 


te 
Bernreuter, Fe. 


a, 19 


Teachers Problems 
Stanford T-A _ 

Stanford A-R ___. 
Almy-Sorenson, S. 
Stanford T-R - 

ay seem Rating-S 

Ur. ¢. . oak, 
Te ; 
3 ae : 
UWHTr. c.____-- 
tS wis 
| FE 

Ha. c._ 


(a) Supervisory Rating Scales 
1. Ratings by supervisors 
2. Ratings by investigator 
(b) Intelligence 
1. American Council Psychological 
Examination 
2. Teachers College Psychological 
Examination 
(c) Personality 
1. Bernreuter Personality Inventory 
2. Morris Trait Index L 
3. Washburne Social Adjustment 
Inventory 
(d) Attitude Towards Teaching 
1. Yeager — Attitude Towards 
Teachers and the Teaching Pro- 
fession 


Tiz Tis Tio Too Tee Tes 


es 


2 3 S28 Senz 2 
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04 .47 —. ‘ 11 

> oe aoe . —_— « 
16—.14_. : .53 .—22 
a ae 4 . -26 .04 


27 .44—. . .19 .02 
. ae ‘ -07 .04 
06 .07.—. : a <a 


-16 
.27 


50 —. 
. 52 


(e) —— of Subject Matter 
. Health Test (Unit I) 
2. Test on Community Planning 
(Unit IT) 
3. Wrightstone — Abilities to Or- 
ganize Research Abilities 
4. American Council Civics and 
Government Test 
(f) Social Attitudes 
1. Social Attitudes of Secondary 
School Teachers 
(g) Mental Hygiene 
1. Torgerson—Test of Mental Hy- 
ene 
(h) Ability to understand disciplinary 
problems 
1. Torgerson — Test of Teaching 
Problems 
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TaBLE XX VII—Continued 
Cs 


Tes Tas 

Wrightstone 19 .0 
Amer. Coun. Psych.... .31 —.10 
— peso .21 —.32 

. 84.—. 02 

ww 
Teachers A 
Coumsunity eng .24 
Health ; 
Amer. Government 


bee Lae Lee hae he Le Le Le Le | 


Seexnsecne ewe 


Torgerson Rating-I --- 
Bernreuter-Bn - ei atin 


Almy-Sorenson-I 
Bernreuter—Bd_--- - - 
Michigan- 

Morris Trait Index -- - 
Bernreuter-Bs 


Bernreuter—F's 
Washburne 

Teachers Problems- - - - 
Stanford T-A 

Stanford A-R 
Almy-Sorenson,§ ---- 
Stanford T-R 
a Rating-S 


(i) Teaching Aptitude 
1. Stanford Educationa] Aptitudes 
Test 


(j) General information and freedom from 
superstitions and prejudices 
1. Lewerenz—Steinmetz—Orientation 
Test 


The interpretation of the intercorrelations 
between teacher measures and the criteria 
given in Table XXVIII as a subset from 
Table XXVII can be further facilitated when 
it is determined how large a correlation must 
be in order that it be significantly different 
from zero. For 24 pairs of variates, a cor- 
relation of .40 is statistically significant in 
that there are only 5 chances in 100 that 
such a correlation will arise from an uncor- 
related population and a correlation of .52, 
based upon an equal number of pairs of 
variates, is highly significant and will arise 


Ci Ce 


in only 1 chance out of roo from an uncor- 
related population.” 

(a) Supervisory Rating Scales and Criteria 
of Teaching Ability.—As has been previously 
pointed out, the traditional method of ascer- 
taining teaching ability has been to have 
supervisors rate teachers on rating scales of 
one type or another. 

The correlations obtained from the data 
of this experiment, between the various rating 
scales and the criteria established on pupil 
changes, fail to reveal any significant rela- 
tionships between the two. Data are not here 
available to determine which is to be pre- 
ferred. It is quite possible that each measures 
a different aspect of teaching ability. 

(5) Intelligence and Criteria of Teaching 
Ability —The coefficients of correlation pre- 
viously reported between intelligence and 
teaching ability have not been large. The fol- 

J. P. Guilford, Psychometric Methods (New eg Mc- 
Graw-Hill, 1936), pp. 548-549 (Table R); or H. A. Wallace 
and G. and, Machine Cal 
i931 (Ames, lows), Dp. 62-23 (Table 16), 


” 
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TABLE XXVIII 
INTERCORRELATIONS OF TEACHER MEASURES WITH CRITERIA 
N= 24 


Teacher 

Measures 
Wrightstone 
Amer. Coun. Psych. -- 
Social Attitudes 
Yeager 
Mental Hygiene_.-- -_- 
Teachers Psychological 
Community Planning - 
Health 


cee nour ew = 


Sasesassadssny 


Torgerson Rating-T -- . a” 


lowing correlations between intelligence and 
teaching ability have been reported: Knight, 
.00; Somers, .43; Whitney, .03; Tiegs, .o1; 
Boardman, .33; Ullman, .15; Odenweller 
(highest score) —.04, (median score) .oo. 
The criteria of teaching ability for the above 
mentioned studies were based, however, upon 
the supervisory ratings of teachers. 


Barr, Torgerson, et al, obtained a corre- 
lation of .37 between teacher intelligence and 
a composite of pupils gains, but it should 
be recalled that this composite included pupil 
accomplishment quotients which tend to be 
unreliable. Thus, the correlation of .37 may 
be lower than it would have been had a more 
reliable index of pupil gain been used. 


The correlations between intelligence, meas- 
ured by the American Council Psychological 
Examination, and criteria C, and C, are .58 
and .57 respectively.** These correlations are 
statistically significant and indicate intelli- 

™ For purposes of interpretation, C, and C,, (the UWH 


composites) have more meaning than the other single com- 
posites since C, and C, represent a more complete picture of 


teaching ability than do the si composites and include the 
results from of the tests ied to the pupils. 


—.06 33 —.09 


gence to be an important factor in teaching 
ability. 

Intelligence as measured by the Teachers 
College Psychological Examination gives cor- 
relations which fall below the points of sta- 
tistical significance except for C, and C,, 
where the coefficients are .47 and .42, respec- 
tively, and are statistically significant. The 
coefficients of correlation between this meas- 
ure and C, and C,, .37 and .40 respectively, 
are insignificant although the latter closely 
approaches the point of statistical significance. 

Of these two intelligence tests, it appears 
that the American Council Psychological Ex- 
amination is more definitely associated with 
teaching ability as measured in this study 
than is the Teachers College Psychological 
Examination. 

(c) Personality and Criteria of Teaching 
Ability—The role that personality plays in 
teaching ability is one which persistently ap- 
pears in educational literature. Among the 
correlations reported between measures of 
personality (by rating scale method) and 
teaching success (also measured by rating 
scales) the following may be noted: Ruediger 
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and Strayer, ..7; Somers, .62; Odenweller, 
25 —.83. Barr, Torgerson, et al, obtained 
a coefficient of .o4 between a personality rat- 
ing scale and total pupil raw score. Oden- 
weller concluded that, “The outstanding trait, 
the one most closely associated with effective- 
ness in teaching, is personality.”** 

Previous investigations made use of rat- 
ing scales which permitted much subjectivity 
to enter and placed the measurement of per- 
sonality upon bases which could easily be 
shifted even by the same raters rating the 
same teachers at different times. In many 
instances, those who rated teachers on per- 
sonality traits also rated them on teaching 
ability so that a halo effect was obtained. 
Such kalo effects, as pointed out by Knight,** 
tend to raise the coefficients of correlation. 
The correlation of personality ratings with 
supervisory ratings as measures of teaching 
ability is of doubtful validity. 

The results of the present study do not 
attach to personality as here measured the 
importance attached to it by other investi- 
gators. The correlations between personality, 
as measured by the Bernreuter Personality 
Inventory (Bn, Bs, Bd, Fs, and Fc) and the 
criteria of teaching ability do not reveal any 
statistically significant correlations. When 
scores on the Morris Trait Index L presumed 
to measure leadership are correlated with 
the criteria of teaching ability, no statistic- 
ally significant correlations are revealed. 
Scores on the Washburne Social Adjustment 
Inventory when correlated with the criteria 
of teaching ability likewise yield no statis- 
tically significant correlations. It then appears 
that personality as here measured is not im- 
portant in conditioning teaching ability when 
pupil change is employed as the criterion of 
teaching efficiency. 

(d) Attitude Towards Teaching and Cri- 
teria of Teaching Ability—Previous investi- 
gations have shown little correlation between 
interest in teaching and teaching ability. 
Barr, Torgerson, et al, found a correlation 
of .o4 between scores on the Strong Voca- 
tional Interest Blank and pupil gains in total 
raw scores. Ullman found a correlation of .o2 
between scores on the Cowdery-Strong Inter- 


"A. L. Odenweller, Predicting the Quality of Teaching. 
Contributions to Education, No. 676 (New York: Bureau of 
Pubieations, Teachers College, Columbia University, 1936), 
Pp. . 

“F. B. Knight, Qualities Related to Success in Teaching. 
Contributions to Education, No. 120 (New York: Bureau of 
yo Teachers College, Columbia University, 1922), 
Pp. 4 


est Report Blank and teaching success. Both 
of these interest blanks are so devised that 
measures of interest to more than one occupa- 
tion or profession can be obtained from one 
administration of the test. 

For the present study, the Yeager scale, 
built upon the statements of high-school 
seniors interested in teaching, was used. 
Scores from this scale correlated with C, and 
C, to the extent of 45 each. These corre- 
lations are statistically significant and indi- 
cate a negative relationship* between in- 
terest in teaching as measured by the 
Yeager scale and teaching ability. It seems 
thus to suggest the opposite to Knight’s state- 
ment, “it is reasonable to suppose that gen- 
uine teaching interest in one’s work accounts 
for a large part of teaching success.’”’** 

(e) Knowledge of Subject Matter and Cri- 
teria of Teaching Ability—Most of the pre- 
vious investigations have defined knowledge 
of subject matter as the total scholarship 
record or as the scholarship record in the aca- 
demic portion of work taken by the teacher. 
Correlations obtained between such measures 
of subject knowledge and supervisory ratings 
have been reported as follows: Meriam, .28; 
Whitney, .97; Odenweller, .28. 

In the present study, the teacher’s knowl- 
edge of subject matter was confined to meas- 
urements in the subject area in which teaching 
occurred. Correlations between measures of 
subject knowledge (Unit test, Community 
Planning, Test, and American Council Civics 
and Government Test) and the criteria of 
teaching ability yield, in most instances, no 
significant correlations. These teacher meas- 
ures are primarily tests of information and 
indicate no significant relationship between 
knowledge of subject information and teach- 
ing ability. 

Correlations between the Wrightstone Abil- 
ities test, designed primarily to measure “non- 
informational” objectives of the social studies, 
and the criteria yield significant correlations 
for all criteria except C, and C,,. 

({) Social Attitudes with Criteria of Teach- 
ing Ability—The test of Social Attitudes of 
Secondary School Teachers when correlated 

* Editor's Note: Since the correlation coefficient is positive 
and since low scores on the Yeager scale represent attitudes 
favorable to the teaching profession, a negative relationship 
between the criteria and attitudes favorable to teachers is 


indicated. 


“F. B. Knight, Qualities Related to Success in Teaching. 
Contributions to Education, No. 120 (New York: Bureau of 
— Teachers College, Columbia University, 1922), 
p. 
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with the criteria of teaching ability revealed 
significant correlations with C,, C,, C,, and 
C, (.49, .50, .49, and .52 respectively); the 
correlations of this measure with C,, C,, C,, 
and C, were statistically insignificant (.26, 
.21, .29, and .29 respectively). It thus ap- 
pears that social attitudes have significant 
relationship to teaching ability when the 
criteria of teaching ability include measures 
of the “non-informational” objectives of edu- 
cation; when teaching ability is expressed in 
terms of information, social attitudes have 
an insignificant relationship. 

(g) Mental Hygiene and Criteria of Teach- 
ing Ability —tThe correlations of .46 and .45 
between scores on the Torgerson test of men- 
tal hygiene and C, and C, respectively, indi- 
cate that a teachers knowledge of mental 
hygiene is sufficiently associated with teach- 
ing ability to be important. The significant 
correlation of .50 with C, and highly signifi- 
cant correlation of .54 with C, are convinc- 
ing evidence of this conclusion. 

(hk) Ability to Understand Disciplinary 
Problems and the Criteria of Teaching Ability 
—The Torgerson—-Teacher Problems test is 
essentially concerned with whether a teacher 
can diagnose disciplinary problems and prop- 
erly proceed to correct disciplinary conduct. 
Correlations between this measure and all of 
the criteria are statistically insignificant ex- 
cept for the correlations obtained with C, 
and C, (.44 and .42 respectively). These cor- 
relations are significant and apply to criteria 
which are heavily weighted with the informa- 
tional aspects of teaching. It would then seem 
that teachers stressing information, are better 
equipped to diagnose and correct disciplinary 
cases than are teachers stressing other objec- 
tives of education. 

(4) Teaching Aptitude and Criteria of 
Teaching Ability—tThe relationship of scores 
obtained on the Stanford Educational Apti- 
tudes Test with the criteria of teaching abil- 
ity are in no case statistically significant. 

(j) General Information and Freedom from 
Prejudice with the Criteria of Teaching Abil- 
ity —The correlations of scores from the Ori- 
entation Test with any of the criteria em- 
ployed are not statistically significant and 
would seem to indicate, strangely enough, that 
teachers’ general information and freedom 
from prejudice as measured by the Orienta- 
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tion Test are of no consequence in teaching 
ability. 

It is possible to arrange these correlations 
so as to indicate those correlations which are 
significant. These data are given in Table 
XXIX. The significant correlations between 
teacher measures and criteria (correlations 
over .40 and less than .52) are indicated by 
a single asterisk while those correlations which 
are highly significant (.52 or over) are in- 
dicated by two asterisks. The unmarked cells 
indicate that statistically insignificant corre- 
lations were obtained. 

From Table XXIX it is possible to rank 
the teacher factors associated with the criteria 
of teaching ability as follows: 


1. Intelligence 

2. Social attitudes 

3. Knowledge of subject matter 
4. Interest in teaching* 

5. Mental hygiene 


The above order should not be construed 
as being fixed and immutable. Some difficulty 
was encountered in deciding whether social 
attitudes or knowledge of subject matter 
should be in second place. Yet this order is 
of particular interest in that intelligence is 
indisputably the most important single teacher 
factor associated with teaching ability. 

(3) Interpretation of intercorrelations of 
teacher measures with each other —Because 
of the extremely large number of correlations 
between the various teacher measures with 
each other and because the central problem 
of this study is to examine the relationship 
between various teacher measures and criteria 
of teaching ability objectively determined, the 
interpretation of that portion of Table XXVII 
which is concerned with the correlations of 
teacher measures with each other must nec- 
essarily be limited to those correlations which 
are most significant and relevant to the cen- 
tral problem of this study. 

The relationship between intelligence, 7, 
and teacher ratings made by the investigator 
is statistically significant whereas such is not 
the case when the teachers are rated by their 
supervisors. 

The Health and Community Planning tests 
when correlated with the other teacher meas- 
ures show no statistically significant correla- 

* Editor's Note: High scores on the Yeager Test indicate 1 
critical attitude toward teachers and ing; interest is 
teaching is therefore negatively correlated with teaching 
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TABLE XXIX 


SIGNIFICANT AND HIGHLY SIGNIFICANT CORRELATIONS BETWEEN TEACHER MEASURES 
AND CRITERIA OF TEACHING ABILITY 


C3 C4 


* Correlations statistically significant (r = .404 up to .515) A 
** Correlations highly significant statistically (r=.515 or over). Blank spaces denote correlations 


statistically insignificant. 


tions. These two tests, heavily loaded with 
information, bear no significant relationships 
with any of the other teacher measures which 
is probably due to the fact that these tests 
measured small subject areas. 

It is very interesting to observe that the 
Yeager scale, which was ranked as one of 
the best (r == .45) single, teacher measures 
associated with the criteria of teaching ability, 
yields no significant correlation with any of 
the other teacher measures. This indicates 
that attitude is a very desirable measure to 
incorporate into a battery of tests designed 
to predict teaching ability since this attitude 
correlated highly with the criterion and low 
with the other teacher measures. 


The highly significant correlations between 
the Morris Trait Index L scores and the 
scores on Orientation Test and the Torgerson 
Test of Mental Hygiene (.53 and 52 respec- 
tively) appear to indicate that those possess- 
ing a high degree of leadership as here meas- 
ured, and those who are well acquainted with 
the fundamental objectives of education also 
tend to be able to diagnose the mental 
problems of their pupils. 

All of the correlations except one between 
scores made on the Torgerson Test of Teach- 
ing Problems and ratings on the teacher rating 
scales are statistically significant. We may in- 
fer from this that the ability of a teacher to 
diagnose and treat disciplinary cases is an 
important factor in the ratings of super- 
visors. 

The very highest correlations, ranging from 
62 to .g1 obtained by correlating the rating 
scales with each other are probably spurious 


due to halo effect. Raters seems to arrive at 
the same conclusion or evaluation regardless 
of the scale used. 


SECTION V 
PREDICTION OF TEACHING ABILITY 


From the correlations listed in Table 
XXVII it is possibie to compute a number 
of multiple correlations so that the relation- 
ships between various combinations of teacher 
measures with the criteria used can be de- 
termined. By applying the Doolittle method 
to this intercorrelational matrix, the beta 
coefficients, necessary for computing both the 
multiple correlations and multiple regression 
equations, are easily obtained. 

The multiple correlations obtained for com- 
binations of teacher measures progressively 
increasing from two up to and including 14 
measures with each of the eight criteria of 
teaching ability are listed in Table XXX.** 

From Table XXX is is easy to ascertain 
the influence of adding a new measure to the 
preceding pool of teacher measures. The suc- 
cessive addition of measures, in most cases, 
lead to progressive increases in the sizes of 
the multiple correlations. The largest coeffi- 
cient of multiple correlation is .86 and was 
obtained between 14 variables and the cri- 
terion C,. The multiple correlations between 


© te cintaiee necessary to obtain the beta coeffi- 
cients, all of the 27 teacher measures and the 
were arranged in one table with the expectation 
ally the multiple correlations of combinations of two up to 
and including the 27 -variables, arranged in progressive 7 
would be obtained with each of the eight criteria of teaching 
ability. Unfortunately, i umbers in the 
lati made it possib! 
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TABLE XXX 
MULTIPLE CORRELATIONS OF TEACHER MEASURES WITH CRITERIA 
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14 variables and the criteria C, and C, are 
.84 and .85 respectively. 

Rather than carry on any further analysis 
with all of the multiple correlations listed in 
Table XXX, it was thought advisable to 
restrict analysis to those multiple correlations 
which would seem to be the most meaningful. 
When the eight criteria are examined to de- 
termine which of these are most meaningful, 
it seems that C, and C,, because they are 
composites of the other criteria and because 
they embody a more complete picture of 
teacher efforts than do any one of the other 
criteria, would be most meaningful. Further 
analysis will be limited, because of this, to 
results obtained by employing these criteria, 
C, and C,. 

Since the coefficients of multiple correla- 
tion were obtained by adding each time a 
new variable to the preceding pool of vari- 
ables, it is possible to obtain a rough estimate 
of the additive value of each variable as it 
is introduced into preceding pools of teacher 
measures. By obtaining the successive differ- 
ences between the sizes of the multiple cor- 
relations for the criteria C, and C,, it appears 
that the six measures contributing most to 
these correlations arranged in descending 
order of magnitude are as follows:* 

* Editor's Note: This listing is meaningful only when the 
Between and T, there are two measures and values, namely 

y 
T, and T,, etc. 


Ce Cz 


. 49 


Cs C4 Cs 


Pee 


~~ 


T 
T, 
T 
T 


= 
= 


1 


Combining these two sets by use of the mean 
rank order*® gives the following single list: 


T, (American Council Psychological 
Examination) 

(Torgerson—Mental Hygiene) 
(American Council Civics and Gov- 
ernment Test) 

(Yeager—Attitudes Toward Teach- 
ers) 

(Community Planning—Unit IT) 


T,, (Bernreuter—Bn) 


The selection of C, and C, for further anal- 
ysis also raised the question as to how sig- 
nificant the multiple correlations were in 
which these criteria had been used. By apply- 
ing tests of significance,*’ the following mul- 
tiple correlations were found to be significant 
at the 5% but not significant at the 1% 
level of confidence. 


J. P. Guilford, op. cit., pp. 246-247. 
™ Jbid., Table K, pp. 548-549. 
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Re,.T,.--T, =.77. Re,.T,... -76 
Re,.T,..-T, ==-79 Re,.7;... 77 
Re,.7;..- == 8: Ke,.7,... 80 
Re,.T,.-+-Ty, = 83 = Reg. Ty. . . Ti) = Be 

Re,.7,...74, == B32 


The same test of significance when further 
applied to those multiple correlations obtained 
with C, and C, showed the copy & to be 
highly significant ie. significant at the 1% 
level of confidence. 

i ee | a 67 

y SP |  * FP 67 
Reé,.3..- -70 
Re,.7... 73 
} Ae 74 





in which X, indicates the criterion or de- 
pendent variable and X,, X,, X;, Xa, the 
teacher measures or independent variables. 


Since, however, in standard score form, the 
means of the criteria equal zero and the crit- 
eria standard deviations equal one, the gen- 
eral formula can be rewritten: 


Z, a0 En x, 4 bax, +:..4 2 x,— 
o; G2 on 


Bes ay 


Dn 


2 
oO; C2 
The proper division of the beta coefficients 


and substitutions gives the following multiple 
regression equations: *° 


== — ,004X, + .006X, + .004X, + .802X, + .033X, — .002X, + .062X, 


— 11.965 


= .014X, + .012X, + .002X, + .550X, 


+ .033X, — .004X, + .047X, 


— .041X, — .006X, — .035X,, — .003X,, — 7.890 
a or1sX, + .007X, + .oorX, + .678X, + .031X, — .006X, — 8.382 


11 


016X, + .004X, — .000X, +1.020X, + .040X, + .001X, + .020X, 


+ .048X, —.o1sX, + .008X,, + .002X,, — 14.348 





It is interesting to observe that when the 
multiple correlations found for C, and C, are 
interpreted on the basis of significance that 


those correlations with the fewer number of. 


variables are the most significant and that 
beyond a point the addition of new variables 
adds little significant value. 

The above lists are still rather long, and 
for the sake of expediency and economy it 
was thought desirable to limit all subsequent 
analyses to the largest multiple correlations 
in either of the lists above. This then resulted 
in the following: 


Re,.7,...0—¢ mm .77 Re,.7,...%, = .74 
Re,.7,...7%5, = 83 Re,.7T,...7,, == 382 


Since the beta coefficients** had already 
been found when the multiple correlations 
were calculated, it is now very easy to obtain 
four multiple regression coefficients for pre- 
dicting teaching ability as defined by the 
criteria C, and C,. The general form of the 
multiple regression equation can be written as: 


X. = Ba — — X, + Bes X,+. 


M.— Be, M 1— Ba ~ a —s 


---— Ben 


% A table giving these beta coefficients will be found in 
the original thesis on file in Library, University of Wisconsin. 


The unknowns for these four regression equations are 
designated as follows: 
= score on the Wrightstone—Abilities to Organize 
Res. Test 
=score on the American Council Psychological 
Examination 
=score on Social Attitudes of Secondary School 
Teachers 
= score on Yeager—Attitude Towards Teachers 
= score on Torgerson—Mental Hygiene Test 
= score on Teachers College Psychological) Exam- 
ination 
= score on Community Planning (Unit II) 
= score on Health Test (Unit I) 
= score on American Council Civics and Govern- 
ment Test 
Xo = score on Torgerson Rating Scale (Investigator) 
XxX = score on Bernreuter Personality Inventory—Bn 
Zey-y.. 7 = Predicted score on UWH, , (Including Tests 
1-7, incl.) 
= Predicted score on UWH, , (Including Tests 
1-11, incl.) 
« = Predicted score on UWH, , (Including Tests 
1-6, incl.) 
Zc, . . 4 = Predicted score on UWH, , (Including Tests 
1-11, incl.) 


SHR KKK KK HK 


Ze 
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By use of a formula given by Kelly, it is 
found that the standard deviations of es- 
timated standard scores for the above equa- 
tions are equal to .634, .562, .676, and .575 
respectively.” 

The above equations were tested by sub- 
stituting for the unknowns the appropriate 
mean scores obtained from the group of 24 
teachers. The resultant prediction was equal 
to zero thus confirming the fact that the 
predicted score for the average teacher would, 
on the basis of standard scores, be equal to 
zero. 


It is also possible to test the accuracy of 
predicted scores as against obtained scores for 
any one or all of the prediction equations 
given. As an illustration of this, the following 
scores of two teachers, selected at random, 
were substituted for the independent variables 
in each of the multiple regression equations: 


Teacher 25 


143 
339 
I4I 
3.8 
94 
148 
78 
75 
133 
74 
159 


Teacher 1 


The predicted scores obtained as contrasted 
with the observed scores are as follows: 


Teacher 1 
Predicted 
Score 


Standard Error 
of Estimate 
. 634 
. 562 
. 676 
. 575 


Dependent 
Variable 
—. 534 
—. 588 
—. 031 
. 209 


It is thus seen that in each case illustrated 
the predicted score approximates the observed 
score within a small fraction of the standard 
error of estimate. These prediction equations 
can then be used to predict teaching ability 
with considerable accuracy. 

The use of prediction equations to deter- 
mine teaching ability objectively is very use- 
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ful in several ways. For example, teacher 
training institutions can determine the degree 
of teaching ability of their students from the 
application of several measures to the pros- 
pective teachers. Superintendents can make 
use of such prediction equations to ascertain 
the teaching ability of candidates for teach- 
ing vacancies. Yet, before prediction equa- 
tions of the sort here illustrated are to be 
used on a large scale, many more studies, to 
determine degrees of teaching ability ob- 
jectively, are necessary. Such studies will 
have to be carried on in many subject areas, 
on different grade levels, and in all parts 
of the country. 


SECTION VI 
SUMMARY AND CONCLUSIONS 


The purpose of this study is to determine 
the relationship between certain teacher meas- 
ures and measureable pupil changes. 

The results of this study indicate that: 


1. The intelligence of the teacher is the 
highest single factor conditioning teaching 
ability and remains so even when in combina- 
tion with other teacher measures. 

2. The social attittudes of social studies 
teachers is an important factor in teaching 
ability. 

3. The attitude of teachers towards teach- 
ing is significantly correlated with teaching 
ability. It should be recalled that high scores 
indicate a critical attitude toward teachers 
and teaching. 





Teacher 25 
Predicted Obtained 
Score Score 

. 751 . 800 
. 176 . 800 
. 823 .910 
. 610 . 910 


Obtained 
Sco; 





4- Knowledge of subject matter and the 
ability to diagnose and correct pupil mental 
maladjustment are each significantly associ- 
ated with teaching ability. 

5. The correlations between supervisory 
ratings of teachers and the criteria of teach- 
ing ability used in this study, are statistically 
insignificant. 
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6. Personality, as here defined and meas- 
ured, shows no significant relationship to 
teaching ability. 

7. The multiple regression equations give 
predicted scores which closely approximate 
the obtained scores-in those examples in which 
the prediction equations are used. 

The findings of this study place different 
emphases upon those qualities associated with 
teaching ability than those heretofore pre- 


sented. The fact that the criteria here used 
were those of pupil change objectively meas- 
ured appears to bring a different constella- 
tion of qualities to be associated with teach- 
ing ability. 

The findings here presented must be inter- 
preted on the basis of the experimental pat- 
tern and measures, both of pupils and teach- 
ers, used in this study. Subsequent studies are 
necessary to verify the findings here presented. 





THE MEASUREMENT OF TEACHING ABILITY 
STUDY NUMBER TWO 


J. F. Rorre 
La Crosse State Teachers College 
La Crosse, Wisconsin 


SECTION I 
THE PROBLEM 


This is the second of a series of studies 
conducted to ascertain the validity of certain 
instruments commonly employed in the meas- 
urement of teaching ability. The results 
secured from the first study’ of this series 
were sufficiently promising to suggest further 
research and the study here reported was 
undertaken to further test these findings. To 
secure comparable results, approximately the 
same sorts of teachers, schools, and measuring 
devices were employed as in the preceding 
investigation. Teaching is an exceedingly 
complex activity, and the controls are not so 
complete as in some fields of research. Thus 
this follow-up study. 


Spectric DESIGN OF THE EXPERIMENT 


The design of the experiment, the measur- 
ing instruments used, and the procedures 
followed were the same as those of the pre- 
ceding study except in certain details to be 
indicated later. 

In the first investigation’ only teachers of 
the 7th, 8th, or combined 7th and 8th grades 
were used. This furnished evidence concerning 
the teaching ability of teachers in small urban 
and first class state graded schools, where the 
teacher taught one or, at the most, two grades. 
The teachers in the study here reported were 
from one and two-reom rural schools. Teach- 
ers from this type of school comprise one of 
the largest groups of teachers in the state of 
Wisconsin and hence merited the considera- 
tion here given. 

Following the plan as outlined in the orig- 
inal proposal, the combined 7th and 8th grade 
class in Citizenship of each participating 
school was selected for the study. The follow- 
ing eo were taken into consideration in 

21 E, Rother, The Measurement ond Prediction of Teach 


Ree dba {unpublished Doctor’s Dissertation (Madison, W' 
isconsin, 1939). 


determining the grade and subject ara 
selected: 

(1) Citizenship was being taught during 
the 1937-1938 school year at the 7th- and 
8th-grade level in all the rural schools. 

(2) Pupil change in the area of th 
Social Studies was considered of growing 
importance among educational objectives, 

(3) The objectives of Citizenship wer 
broad enough to allow for considerable vari- 
ability in the techniques of teaching used 
by the teachers selected. 

(4) More desirable measuring instru. 
ments were available in this area and at 
this grade level than in certain other areas 
and grade levels. 


The following limitations were set up in the 
selection of participating schools: 


(1) The schools were to be one- or two- 
room rural schools, employing one and no 
more than two teachers. 

(2) Citizenship was to be taught at the 
7th- and 8th-grade level throughout the 
course of the school year. 

(3) At least five pupils were to have 
been enrolled in the combined 7th and 8th 
grades. 

(4) The teacher must be willing to par- 
ticipate in the investigation. 


Shortly after the opening of the school year 
in the fall of 1937 a large number of school 
were visited and the proposed plan described 
to the teachers. The teachers in those school 
meeting the above requirements were invited 
to participate. A sufficient number of teachers 
agreed to participate, and a group of 72 
schools located in Eastern Dane, Westem 
Dane and Columbia Counties within a radius 
of 35 miles of Madison were selected. 

The participating teachers agreed to accept 
the following general objectives for the year’s 
course in Citizenship: 
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The teacher should aim in the citizenship 
course to develop the pupil’s ability to dis- 
charge his socio-civic responsibilities with 
intelligence and efficiency. This, as set forth 
in the directions to the teachers, was to 
yncagen (1) efficiency in human relation- 
ships: the ability to get along with other 
people; the ability to work cooperatively 
with others in group enterprise; the ability 
to subordinate one’s individual gain to 
group welfare where one conflicts with the 
other, etc.; (2) understanding of the funda- 
mentals of socio-civic relationships: knowl- 
edge of one’s civic rights, duties, and re- 
sponsibilities; knowledge of social, eco- 
nomic, and political principles and practices; 
knowledge of moral, ethical, and religious 
conventions; understanding of the prin- 
ciples and practices of the government 
under which one lives, etc.; (3) the mastery 
of the tools for effective thinking: the 
ability to see and solve problems; the abil- 
ity to collect relevant data, make the neces- 
sary analyses, and reach judgments based 
upon fact; the ability to suspend judgment, 
maintain an open mind, and view solutions 
critically in arriving at decisions, etc.; and 
(4) interest in socio-civic relationships, 
activities, and responsibilities: willingness 
to accept socio-civic responsibilities; the 
faithful discharge of these responsibilities; 
and an attitude of tolerance toward the 
opinions and actions of others. 


The procedure followed in the collection of 
the data was, (1) the administration of a 
battery of tests designed to measure the gen- 
eral objectives of the year’s course to all par- 
ticipating pupils at the beginning and end of 
the school year so as to obtain measures of 
pupil changes occurring over a six-month 
period; (2) the application of appropriate 
pupil measures just prior to the teaching of 
and immediately following the teaching of 
two three-week units in the field of citizen- 
ship—, one of these units to be taught in the 
fall of the year, the other in the spring of the 
same school year;* (3) the application of an 
intelligence test and a reading test to each 
pupil at the beginning of the year to be em- 
ployed in the equating of groups, and (4) the 
application of various measures to the teacher. 
The validation of the measures peu y to the 

* These a top ond State Course of 
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th; are Unit II, Conmunity Planning. ws 


teachers is one of the primary concerns of this 
investigation. 

Several weeks before the teaching of the 
first unit was begun each teacher was visited 
by the investigators, and final arrangements 
were made for the administration of the vari- 
ous pupil measures. Due to the number of 
teachers and the number of tests to be admin- 
istered, it was necessary to stagger the testing 
periods from school to school. The time 
schedule for each school, however, was care- 
fully controlled so that the periods between 
initial testing and final testing for the long- 
time changes was six months, and for the 
units three weeks. The number and length of 
citizenship class periods per week were held 
constant for all teachers. 


Several weeks before the teaching of the 
first unit on public health was to begin, a 
letter was sent to each teacher giving the 
dates upon which the testing and teaching 
were to begin. A statement of the topics to be 
covered and the general objectives were also 
sent as a guide in teaching the unit.* The 
teacher was free to employ whatever means 
she deemed appropriate, that is, she was free 
to use any subject matter, or materials that 
she thought best in the attainment of the 
objectives. Before any teaching was begun two 
batteries of “over all” long time measures 
were applied to the pupils, namely, (1) three 
Wrightstone tests, and (2) three Hill tests.‘ 
Immediately prior to the teaching of unit one, 
the Health test designed to measure the objec- 
tives of this unit was administered to each 
class. Teaching on this unit then continued 
for 13 successive days and on the 15th day 
the same test given as a pretest was admin- 
istered as a final test. Following the teaching 
of this unit, which took place in October, the 
teacher resumed her normal course of study. 
In the spring of the same school year the 
teachers were informed of the exact dates for 
the teaching of the second unit on Community 
Planning. As in the first unit, the teachers 
were sent a list of topics and the general 
objectives to be followed. Shortly after the 
teaching of the second unit, the pupils were 
again given the two batteries of tests, namely, 

wise Appendix “B” of original thesis on file University of 


ibrary for instructions and announcements sent to 
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TABLE I 


Sex, Ace, YEARS oF TEACHING EXPERIENCE, YEARS OF TRAINING BEYOND HiGH ScHOOL, 
MONTHLY SALARY FOR EACH PARTICIPATING TEACHER 
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4 
7 
7 
3 
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(1) three Wrightstone tests, and (2) three 
Hill tests, which had been administered as 
initial tests in the fall.° 


DESCRIPTION OF THE TEACHERS PARTICI- 
PATING IN THIS INVESTIGATION 


This investigation started with 72 teachers. 
Fifteen of those who started the investigation 
were subsequently dropped because of incom- 
plete data, leaving a total of fifty-seven 
schools from which complete pupil and 
teacher data were obtained. 

The 57 teachers participating in this inves- 
tigation were teaching in either one-room or 
two-room rural schools in rural areas of Dane 
and Columbia counties. Table I summarizes 
the information concerning five factors, 
namely, sex, age, years of teaching experi- 
ence, years of training beyond high school, 
and monthly salary for each participating 
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28.26 8.05 1.46 91.75 

teacher. It will be noted that fifty of the 
teachers were women, seven were men. The 
range of chronological age extended from 20 


to 54 years inclusive, with a mean age of 


* 28.3 years; and a median of 26 years. 


The total years of teaching ranged from 1 
to 30 years with a mean of 8.05 years; and a 
median of 7 years: This was the first year of 
teaching for five teachers. 

All teachers had at least one year of train- 
ing beyond high school with the average being 
1.46 years. Thirty-two teachers had attended 
a county normal school for only one year. 
Three teachers were college graduates and 
one had received a Master’s degree. 

The monthly salaries ranged from sixty- 
five dollars to one hundred forty-five dollars, 
with an average monthly salary of ninety-one 
dollars and seventy-five cents. 

The typical teacher participating in this 
investigation could be described as a woman 
28.3 years old, having taught 8.05 years, re- 
ceived 1.46 years of professional training and 
earned $91.75 per month for the school year 
1937-38. 
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TABLE II 
ScHOOL, GRADE, AND NUMBER OF PUPILS PARTICIPATING IN THIS STUDY 


No. of 
Pupils 
14 
6 
10 
15 


School 
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DESCRIPTION OF THE PuPILs PARTICIPATING 
tn Tus INVESTIGATION 


There were 404 pupils in the 57 classes 
who participated in this investigation. The 57 
classes ranged in size from 4 to 18 with an 
average of seven pupils per class. The small 
average class size was due to the fact that all 
but seven schools were one-room rural schools. 
Table II furnishes information relative to 
schools, grades and number of pupils partici- 
pating in this investigation. 

For the particular area of the curriculum 
investigated by this study all schools followed 
the practice of combining the seventh and 
eighth grades. 


DESCRIPTION OF THE SCHOOLS PARTICIPATING 
IN TuHIs INVESTIGATION 


The schools participating in this investiga- 
tion include 50 one-room rural schools and 
seven two-room rural schools. A one-room 
rural school is a school having one classroom 
with one teacher who teaches all subjects in 
grades one through eight. A two-room rural 
school is one having two classrooms, one 
teacher teaching grades one through four, the 
other teaching grades five through eight. 

The typical school may be described as a 
one-room rural school with one teacher, a 
total enrollment of 21 pupils with seven pupils 
in grades seven and eight. The district per 
pupil valuation is $12,315.00 and the total 
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Total Number of Participating 
404 


per pupil expenditure for the school year is 
$65.00. Table III shows this information for 
each participating school. 


SECTION II 


DESCRIPTION OF TEACHER AND 
PUPIL MEASURES EMPLOYED IN 
THIS INVESTIGATION 


In the formulation of the plan of this in- 
vestigation a definite effort was made to 
include within its organization as many as 
possible of the desirable results of previous 
studies. Three significant conclusions drawn 
from previous investigations guided in the 
selection of measuring instruments: 


1. To measure successfully the character- 
istics of teaching ability it is necessary to 
measure a number of important aspects of 
teaching. 

2. To use desirable changes in pupils as a 
criterion of teaching success, it is necessary 
that measures of change other than that of 
academic achievement be included, —all 
changes in the pupil being important. 

3. Measures of teaching ability must be as 
valid, reliable, and objective as possible. 


TEACHER MEASURES 


The teacher measures employed were in the 
main the same as those employed in Rostker’s 
investigation. The reader is referred to Rost- 
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Taste III 
INFORMATION ABOUT THE SCHOOLS PARTICIPATING IN THIS STUDY 


Per 
Total Pupil Per 
School School Pupil 
Typeof Enroll- Valua- School 
School* ment tion Cost 
2-room $5,863 $115 
1-room 22 ,308 74 
1-room 114 
2-room 
1-room 
1-room 42 
1-room 
l-room 
1-room 
1-room 
1-room 
1-room 
1-room 
1-room 
1-room 
1-room 
1-room 
l-room 
1-room 
1-room 
l-room 
1-room 
1-room 
2-room 
1-room 
1-room 
2-room 
1-room 
1-room 


CoOAQur ONr 


Per 

Total Pupil Per 

School School Pupil 
Typeof Enroll- Valua- School 
School* ment tion Cost 
1-room $8,833 $ 75 
1-room 53 
l-room 
1-room 
2-room 
l-room 
1-room 
l-room 
l-room 
l-room 
l-room 
l-room 
1-room 
1-room 
2-room 
l-room 
l-room 
l-room 
l-room 
1-room 
l-room 
1-room 
l-room 
1-room 
l-room 
1-room 
2-room ’ 
1-room 7,111 


6 ,227 
12,315 65 


School 


16 ,625 
17 ,733 
20 ,667 
13 ,706 
14,600 
9,474 

5,320 


Ave. 2-room 
Ave. 1-room 21 


*A one room school is to be interpreted as a school having one room and one teacher who teaches 


grades I through VIII. A two-room school is to be interpre 


as a school having two rooms and two 


teachers, one teacher teaching grades I-IV, and the other teacher teaching grades V-VIII. 


ker’s study (pages 15-20) for a description 
of these measures. Four measures were added, 
namely, 


1. A scale for evaluating the personal fit- 
mess of teachers—(Mimeographed, Depart- 
ment of Education, University of Wisconsin, 
Madison, Wisconsin, 1937.) This teacher 
rating scale consists of 33 teacher traits such 
as accuracy, health, loyalty, sociability, thrift, 
etc., each of which is rated on an eleven point 
scale (o—-10). If the rater could think of no 
teacher, teaching at the same grade level, who 
was better with respect to a given trait, he 
was instructed to check the “o” value. If the 
rater could think of one teacher who was 
better, the rating was to be “1”, and so on. 
As the rater recalled increasing numbers of 
teachers who exceed the teacher being rated 
on any given trait other values on the scale 


were to be checked. The score a teacher re- 
ceives is the arithmetic average of the num- 
bers checked by the rater for the 33 traits 
enumerated. Low scores are desirable since 
they indicate that the rater could on the aver- 
age think of relatively few teachers who sur- 
passed the teacher being rated. High scores 
on the other hand indicate that the rater 
could readily think of numbers of teachers 
who in his judgment were better with respect 
to these traits. 

Three raters rated each teacher on this 
scale, namely, the county superintendent of 
schools, the county supervisor, and the inves- 
tigator. The three resultant ratings were aver- 
aged to obtain the index that was used in 
subsequent analysis of this instrument. 

2. A personality rating scale—(Mimeo- 
graphed, Department of Education, Univer- 
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sity of Wisconsin, Madison, Wisconsin, 1937.) 
In this scale the rater was asked to assign 
one of eleven intensity values to each of 
six terms which are descriptive of the total 
personality effect the teacher has upon others. 
The six terms upon which teachers were to 
be rated are: pleasing, forceful, wholesome, 
interesting, stimulating, and confidence- 
inspiring. For each term, the rater checked 
either 0, I, 2, ...0r 10 depending upon 
whether he could think of none, one, two, 
etc., or ten persons who were superior to the 
teacher being rated. The total score was the 
arithmetic average of the six numbers assigned 
by the rater and represents the number of 
persons he recalled who, in his judgment, 
were superior in total personality effect to the 
teacher rated. From this point of view low 
scores are desirable. The score on this scale 
used in subsequent discussions is the average 
of the ratings assigned to each teacher by the 
three raters as described above. 

3. A test of teacher-pupil relationship — 
(Department of Education, University of 
Wisconsin, Madison, Wisconsin, 1938. Mime- 
ographed form used by special permission of 
the authors, T. L. Torgerson; Bjarne Ulisvik; 
and Lawrence Wahlstrom.) This test is de- 
signed to measure a teacher’s understanding 
of mental hygiene principles and their appli- 
cation to specific teacher and pupil situations. 
It consists of six parts. Part I is a measure 
of a teacher’s understanding of symptoms and 
causes of pupil maladjustments. Part II is 
used to appraise classroom situations. Twenty- 
eight behavior traits found in a typical class- 
room are listed. The teacher is to indicate on 
a four-point scale the frequency of occurrence 
of each behavior trait in her classroom. Part 
III tests the teacher’s power to evaluate 
behavior traits. Part IV is a measure of the 
teacher’s ability to successfully apply prin- 
ciples of mental hygiene. Twenty descriptions 
of child behavior common to the ordinary 
classroom are listed. From a master list of 
37 procedures for caring for behavior situa- 
tions, a teacher selects the ones she deter- 
mines to be applicable. Part V sets forth 
eleven problem situations that frequently 
arise in the classroom. From among 25 pro- 
cedures, the teacher selects those which are 
most desirable for correcting the particular 
situation. Part VI is designed to secure a 
measure of each teacher’s classroom practices. 
Forty-eight common classroom practices are 


listed. The teacher indicates whether she uses 
each practice always, usually, sometimes, 
rarely or never. 

4. The Sims score card for socio-economic 
status, Form C.—(Public School Publishing 
Company, Bloomington, Illinois, 1927.) The 
purpose of this Score Card is to provide a 
simple, convenient, and objective device for 
ascertaining and recording the genera] cul- 
tural, social, and economic background for 
those to whom it is applied. It may be used for 
determining the socio-economic status of any 
social group. It was used in this investigation 
for the purpose of obtaining numerical ratings 
which would permit a statistical study of 
socio-economic status as a factor in teaching 
efficiency. Home conditions need no longer be 
recorded as good, average, or poor, but may 
be given a numerical rating that is far more 
precise than the usual verbal characterizations. 

The specific area measured by a certain 
test is often difficult to determine. Most in- 
telligence examinations contain a share of 
material that measures information. Rating 
scales possess items that refer to many differ- 
ent characteristics of teachers. However, the 
eighteen measures used in this study may be 
grouped as follows: 


Intelligence: 
American Council Psychological Examina- 
tion. 
Teachers College Psychological Examina- 
tion. 
Knowledge of subject matter: 
American Council Civics and Government 
Test. 
Hartmann—Public Problems Information 
Test. 


Personality: 

The Bernreuter Personality Inventory: Bn, 
Bd, and Bs. 

Morris Trait Index—L. 

Washburne—Social Adjustment Inventory. 

Barr and Others—A Scale for Evaluating 
the Personal Fitness of Teachers. 

Barr and Others—Personality Rating Scale. 


Teacher Rating Scales: 
Torgerson Diagnostic Teacher Rating Scale 
of Instructional] Activities. 
Almy-Sorenson Rating Scale for Teachers. 
The Michigan Teacher Rating Scale. 
A ptitudes: 
Stanford Educational Aptitudes Test. 
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TABLE IV 
Puri. Tests: MEANS AND STANDARD DEVIATIONS FOR THE TOTAL POPULATION OF 404 Pupiig 


Final Means 


Test Pre-test 
96. 41 


Wrightstone: 
Civic Beliefs___- - - 
Generalizations. _ - 
Abilities 


94.93 
40.49 
89. 58 


8.12 
12. 23 
9.27 


71.48 
92.04 


Chronological Age: - -- 
Mental Age: 


Attitudes: 
Hartmann—Social Attitudes of Secondary 
School Teachers. 
Wrightstone—Scale of Civic Beliefs. 
Lewerenz—Steinmetz — Orientation Test 
Concerning Fundamental Aims of Edu- 
cation. 
Yeager—Scale for Measuring Attitudes 
Toward Teachers and Teaching. 
Torgerson—A Test of Teacher-Pupil Rela- 
tionship. 
Socio-Economic Status: 
Sims Score Card for Socio-Economic Status. 


The pupil tests were the same as those 
employed by Rostker, except that the two unit 
tests were revisions of those used by Rostker. 
The Sims Score Card was not used with pupils 
as in Rostker’s study. 


Final Standard Deviations 
Gain T Pre- Gain 


81. 48 
40.10 
72.81 


11.18 
20.79 
10. 89 


34. 87 
47.10 


404 

404 157.75 
404 101.75 
4 73.91 


SECTION III 


THE DEVELOPMENT OF THE 
CRITERION OF TEACHING 
ABILITY 


The criterion of teaching ability employed 
in this investigation was that of measurable 
changes produced in pupils. In general the 
same tests, except as already noted, and the 
same procedures were employed in develop- 
ing the criterion in this study as in Rostker’s 
study. 

The means and standard deviations for the 
final, initial, and change scores for each of 
the eight pupil tests were calculated. These 
measures with the means, standard deviations, 
and the number of pupils in each class for 
the 404 pupils in this study are listed in 
Table IV. Nine separate criteria of teaching 
ability were calculated for each teacher, one 


TABLE V 
INTERCORRELATIONS OF PUPIL TESTS AS MEASURES OF FINAL STATUS 
N = 404 


Hi H2 
. 565 . 573 
. 568 . 549 
. 478 
. 382 
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TaBLE VI 
INTERCORRELATIONS OF PUPIL TESTS AS MEASURES OF INITIAL STATUS 


N= 


404 


H; H: 
‘ . 526 
. 488 


INTERCORRELATIONS OF PUPIL TESTS AS MEASURES OF CHANGE 
N= 404 


for each pupil gain measure, and one from a 
combination of the eight measures taken 
together. 

In order to derive the composite score it 
was necessary to consider the intercorrelations 
of the tests. The intercorrelations of the eight 
final test scores, the eight initial test scores, 
and the eight change scores were calculated 
with the results reported in Tables V, VI, and 
VII. A test was made to determine how large 
the intercorrelations must be in order to be 
significantly different from zero. Correlations 
larger than .10 were taken to indicate the 
presence of a real relationship with consider- 
able assurance. 

To obtain the UWH composite pupil gain, 
the mean pupil gains (Table A,,)* for the 


* See original thesis on file University of Wisconsin Library. 


eight tests were divided by the appropriate 
pooled sigmas and then composited according 
to their reliabilities in a 1:1:7:7:7:1:1:1 
ratio with the heaviest weights to the three 
Wrightstone tests. To secure a single yard- 
stick for the initial and final tests the devia- 
tion scores were divided by a pooled sigma 
developed from the initial and final sigmas 
according to the following formula: 


4/ ees + Oo Final 
Ppoolea == 2 


The initial, final, and pooled sigmas are 
reported in Table VIII. 

The resultant weighted average became: 
UWH composite gain = .082U,, + .073U 2. 
+ .357Wieg + .539Wee + -504Wee + 
.322H yg + .304H xg + .320H 5. 








TABLE VIII 
THE INITIAL, FINAL, AND POOLED SIGMAS FOR THE EIGHT PUPIL MEASURES 


Pupil Test 
SAS A ene. ee eee 
Community Planning - - - - 
Abilities 
Civic Beliefs___- 
Generalizations. __- 
Hill—Attitudes _ - 
Hill—Information 
Hill—Action 


Standard Deviations 

Initial Final 
11. 84 ©] 12. 56 
14. 04 

16. 82 

13. 43 

12.34 

3.19 

3.48 

3.10 
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The average UWH composite pupil gain, 
increased by 10, is listed for each class in 
column (2) of Table IX. The constant 10 
was added throughout merely for the con- 
venience of eliminating minus signs. 

The eight separate measures of pupil change 
were thus reduced to the U. W. and H. com- 
posites and represent the average progress 
made by pupils of the 47 classes toward the 
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attainment of the educational goals set forth 
as measured by the eight tests. These three 
composites represent three separate criteria of 
teaching ability. 

One of the major purposes for making the 
composite of the eight tests into three sets, 
U. W. and H. was to increase the reliabilities 
of the tests, particularly as measures of 
change. The next step was to calculate the 


TABLE IX 
OBSERVED, PREDICTED, AND RESIDUAL PUPIL GAINS FOR THE 47 CLASSES 


Observed UWH 
comp. pupil 
gain 


Class and 
Teacher 
Number 

, 12. 
. 48 


Predicted 
UWH comp. 
pupil gain 


Residual pupil 


PGTA (37-38) 
ain Rank of 
PGTA (37-38) 


Teacher 
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4. 
—b. 
2. 
4. 
9. 
4. 
1. 
—2. 
5. 
—9. 
4. 
—4, 
5. 
7. 
—T. 
—5. 
5. 
5. 
—4. 
5. 
3. 
4. 
—l. 
2. 
2. 
4. 
—4. 
4. 
—4. 
8. 
1. 
4. 
2. 
—T. 


ae 
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TABLE X 


F anp CHI-SQUARE VALUES UsED TO DETERMINE HOMOGENEITY OF PUPIL 
Factors For 57 CLAsses™ 


Pupil Factors 


-Q 
Re ding 
Unit Co Composite Initial Score 
Wrightstone Composite Initial Score 
Hill Composite Initial Score 


F Chi-Square 
1. 067 90. 305 
90. 926 
110. 795 
49. 531 
67.115 


2.247 36. 376 





5% Value 
| 4 Sena 


reliabilities of the final, initial, and change 
composites. This was done according to a 
formula taken from Kelley.’ These reliabilities 
are given below: 


RELIABILITIES OF COMPOSITES OF PUPIL 
Test ScoREs 

Final Initial 

. 7172 
. 901 


- 824 
. 928 


Change 
. 464 


. 690 
. 187 
. 684 


Unit Composite 
Wrightstone Com- 


These composite reliabilities are in most 
instances substantially higher than the indi- 
vidual test reliabilities. The reliabilities of the 
change scores are still substantially lower than 
were those of the final and initial scores. To 
further stabilize these change scores a single 
composite score was developed from the U, 
W and H composites. 

In order, however, to make comparisons of 
the changes purportedly produced by each of 
the 57 teachers with her group of pupils it 
was necessary that the groups be homogeneous. 
Tests of homogeneity were made with respect 
to M.A., 1.Q., Reading, U composite initial 
test score, W composite initial test score, H 
composite initial test score and UWH com- 
posite initial test score. See Table A (Appendix 
A)® lists the means and standard deviations 
for pupil C.A., M.A., 1.Q., and Reading, for 
each class and for the total population. To 
test the homogeneity of the 57 groups the F 
test as described by Snedecor® was employed. 


™T. H. Kelley, Statistical — 3 (New York: MacMillan 
Company. 1923), p. 194, Formula 147. 
oe thesis on file, Library University ¢ Wisconsin. 
*G. Snedecor, Statistical Methods (Ames, Iowa: Ames 
Colgate Press, 1938), ‘ 182-189 (Section 10.4). 

‘5% level of significance’ means that if the iment 
could be repeated a hundred times under ey the same 
conditions each time, a difference as bi 
one abserved would occur due to chance actor omy five a. 
out of this hundred. Whe' = a > 5% level or a 
veut e- is in the | analysis a matter of 

The 5% level is A... used in statistical 


1.38 
1. 57 


. 196 


If the value of F is larger than the 5% or 
1% value the hypothesis that the means of 
the 57 classes are homogeneous is invalid. 

The Chi-square test*® was employed to test 
the homogeneity of class variances. The 
values of F and Chi-square obtained from 
these tests as well as the corresponding 5% 
and 1% values necessary to determine 
whether the calculated values refute the 
hypothesis of homogeneity or not are listed 
in Table X on the following page. 

An inspection of Table X reveals that the 
values of F, with the exception of M.A., are 
much higher than the 5% value which indi- 
cates that the means of the 57 classes relative 
to these factors are too discrepant to be con- 
sidered homogeneous. It was further observed 
that the Chi-Square values, with the excep- 
tion of Unit Composite Initial score and 
Wrightstone Composite Initial score are 
higher than the 5% value which indicates 
that the classes are not homogeneous with 
respect to their variances on these factors. 

This variation of pupil factors has been 
observed in most researches of this sort and 
requires the elimination of a number of pupils 
to arrive at homogeneous groups. It was nec- 
essary in this investigation to arrive at as 
many homogeneous groups as possible with 
respect to class means of M.A., 1.Q., Read- 
ing, Unit Composite Initial score, Wrightstone 
Composite Initial score, Hill Composite Ini- 
tial score and the UWH composite. By trial 
and error those classes having extreme vari- 
ances on the pupil factors considered were 
eliminated. Ten classes were finally deleted 
in ey that the remaining classes — rep- 


R. Rider, An Introduction to — = Method 
(New "York : John Wiley and Sons, 1939), ee 
mR. A. Fisher, Statistical Methods for ot Workers 
(London: Oliver and Boyd, 1934), gz 62. Since Chi 
Tables do not go beyond v= 
pr Mel as which gives a mes value and the value of 
“2” for 5% and 1% read from Fisher's Tables. Values 
of ‘‘#’”” may be either plus or minus. 
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TABLE XI 
F AND CHI-SQUARE VALUES USED TO DETERMINE HOMOGENEITY OF PUPIL 
CLASSES 


Pupil Factors 


Unit ae Initial Score 
Wrightstone Composite Initial Score 
-3 S Composite Initial Score 


F Chi-Square 
1. 087 64. 606 
1. 832 34. 754 
2.016 64. 849 
4. 007 37.209 
4.013 46.151 
2. 230 29. 831 
3. 388 65. 242 





5% Value 
1% Value 


resent a group whose variances were homo- 
geneous in the several pupil factors. These 
classes were numbered 1, 5, 10, 13, 24, 39, 
49, 52, 54, amd 55. All subsequent calcula- 
tions were based upon the remaining 47 
classes with a total of 338 pupils. Table XI 
lists the Chi-square and F test values for test- 
ing the homogeneity of the means and stand- 
ard deviations of the remaining 47 classes 
together with the 5% and 1% values which 
are necessarily based upon somewhat different 
degrees of freedom than for the previous 57 
classes. 

Inspection of Table XI reveals that all 
classes are consistent with the hypothesis of 
homogeneous variances. Only the M.A. factor 
is within the 1% limit for the “F” value. 


1.40 
1. 57 


The UWH composite of the mean pupil 
changes in the eight pupil measures for these 
47 Classes was employed in the further devel- 
opment of the criterion. These mean pupil 
changes, however, are assumed to be a func- 
tion of three factors, (1) the efforts of the 
teacher, (2) the abilities of pupils, and (3) 
other factors not measured. It was assumed 
that the 47 classes were homogeneous on all 
factors not measured in this investigation. 
Any differences in mean pupil changes among 
the 47 classes may then be said to be a func- 
tion of the efforts of the teacher and pupil 
abilities plus constant factors. 

To secure an index of the teacher’s efforts 
with the effect of pupil abilities such as M.A., 
1.Q., Reading, and previous knowledge of the 


TABLE XIi 


INTERCORRELATIONS OF PUPIL TESTS AS MEASURES OF INITIAL STATUS WITH PUPIL 
TESTS AS MEASURES OF CHANGE 


N= 


338 


Initial Status 


U: 
—.010 


W; 


. 059 


MBeaSRSoee & 


ed 


—. 378 ‘ . 032 

—. 006 

—. 028 ‘ —. 415 

. 012 . . 057 

; O71 ‘ . 003 
.012 .013 ‘ . 067 
. 038 . . 073 


U;—Unit I—Health Test. 

U:—Unit I—Community Planning. 

be rae ee nar mg A to Organize Research Material. 
W:—Wrightstone—Scale of Civic Beliefs. 
Wi—Wrightstone—Ability to Generalize. 
H;—Hill Information 

H;—Hill Attitude. 

H;—Hill Action. 

U—Unit Composite. 

W.—Wrightstone Composite. 

H.—Hill Composite. 

ri.—Correlation between initial status and change. 
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subject held constant, a series of intercorrela- 
tions were calculated. The intercorrelations 
of pupil tests as measures of initial status and 
as measures of change are given in Table XII. 
The correlations of 1.Q., M.A., and Reading 
with the pupil tests as measures of pupil 
change are given in Tables XIII, XIV, and 
XV 


In developing the regression equation for 
partialing out the effect of variable pupil 
abilities between classes the intercorrelations 


TABLE XIII 
N = 404 


CORRELATIONS OF I1.Q., M. A. AND READING 
Wirn Pupit Tests AS MEASURES 
oF INITIAL STATUS 


I.Q. M.A. Reading 
. 451 ‘ . 571 
.444 
, 345 
. 807 
. 480 
. 580 
. 479 
. 499 


. 476 
. 323 
. 385 
. 407 
. 377 
. 385 
. 304 


TABLE XIV 


CORRELATIONS OF I.Q., M. A. AND READING 
Wit Pupit Tests AS MEASURES 
OF FINAL STATUS 


N = 404 


M.A. 
- 625 
- 615 
. 543 
. 872 
- 495 
. 547 
- 555 
- 488 


CORRELATIONS OF I1.Q., M. A. AND READING 
Wirth Pupit Tests AS MEASURES 
OF CHANGE 


N= 404 


1.Q. 

. .039 
—. 008 

. 049 


M.A. 


. 026 
. 075 
—. 002 
—. 005 . 023 
011 . 082 . 040 
. 087 . 181 . 051 
. 089 . 091 - 131 
. 045 . 081 . 072 


Reading 
. 030 
. 182 
. 042 
. 126 


of each pupil factor with the mean pupil 
change were then calculated. These are given 
in Table XVI. 

The beta coefficients for the regression 
equation for predicting UWH composite gain 
from M.A., L.Q., Reading, and UWH com- 
posite pre-scores were obtained from the 
above correlation table by Aitken’s method of 
pivotal condensation.’* The resultant means, 
sigmas, beta coefficients and regression coeffi- 
cients are given in Table XVII. 

The constant term in the regression equa- 
tion is equal to —47.35. The regression equa- 
tion is written thus: 

X, == .613X, — .497X2 + .254X%, — 
511X, — 47.35: 
where X, — predicted average UWH com- 

posite change. 
xX, = M.A. class mean. 
X, = 1.Q. class mean. 
X, = Reading class mean. 
X,=—UWH composite initial status 
class mean. 


A predicted average pupil change was cal- 
culated for each of the 47 classes by means 
of this prediction equation. These are listed 
in column 3 of Table IX. 

That portion of the observed pupil change 
UWH composite, C,, less the predicted pupil 
change, C,, gave an amount, C,, thought to be 
due to the influence of the teacher; that is 
C, — C, = C;. Table IX" lists the values of 
C., Cp, and C, for each of the 47 classes. 
Thus column 4 in Table IX lists an index of 
teaching ability for each of the 47 teachers. 


SECTION IV 


STATISTICAL VALIDITY OF SELECTED 
TEACHER MEASURES 


The criterion developed in the previous 
section represents an objective measure of 
teaching ability. It is the now to 
determine the statistical validity of certain 
teacher measures by studying their relation 
to the established criterion. 

The list of teacher measures studied fol- 
lows: ** . 

Avi (New York: Hought iin ———{ 139). pp. 


43 See Table A,,, original thesis on file, Library, University 
of Wisconsin. 
“For a description of these measures see pp. 15-20. 
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TABLE XVI 
CORRELATIONS EMPLOYED IN THE PREDICTION OF Pupil Factors In UWH Composite CHANG 
N = 47 


—. 045 


2 
. 906 


1.000 


. 556 : 
566 . 320 
. 253 


TABLE XVII 
MEANS, SIGMAS, BETA AND REGRESSION COEFFICIENTS 


American Council on Education Psycho- 

logical Examination for College Fresh- 

men—1936 Edition. 

Psychological Examination Prepared for 

the Teachers College Personnel Associa- 

tion—Form C—1938 Edition. 

American Council Civics and Government 

Test—Form B for High Schools and 

Colleges. 

Hartmann—Public Problems Information 

Test. 

Bernreuter—Personality Inventory, Bn. 

Bernreuter—Personality Inventory, Bd. 

Bernreuter—Personality Inventory, Bs. 

Morris Trait Index “L”’. 

Washburne—Social Adjustment Inven- 

tory—Sapich Edition. 

A Scale for Evaluating the Personal Fit- 

ness of Teachers, unpublished material, 

University of Wisconsin. 

Personality * Rating Scale, unpublished 

materials, University of Wisconsin. 

Torgerson Diagnostic Teacher Rating 

Scale of Instructional Activities. 

Almy-Sorenson Rating Scale 

Teachers. 

Michigan Teacher Rating Scale. 

Stanford Educational Aptitudes Test, 

T-A. 

T,, Stanford Educational Aptitudes Test, 
A-R. 

T,, Stanford Educational Aptitudes Test, 
T-R 


T,, Hartmann—Social Attitudes of Teachers. 


le 


Aosss 


Aa 


— 
_ 


_ 
Nn 


for 


— 
. 


Sg ee ee 


ow 


Means 


Beta 
Coefficients 
.617 

—.411 


Regression 
Coefficients 
. 613 


Sigmas of 
hen 
7.30 
5.99 
11. 68 
10. 53 
7.25 


T,, Wrightstone— Scale of Civic Beliefs, 
Forms A and B combined. 

T,,. Lewerenz—Steinmetz — Orientation Test, 
1935 Revision. 

T., Yeager—Scale for Measuring Attitudes 
Toward Teaching and the Teaching Pro- 
fession. 

T,, A Test of Teacher—Pupil Relationship by 
Torgerson, Ullsvik, and Wahlstrom. 

T,,; Sims Score Card for Socio-Economic 
Status, Form C. 


Additional data were obtained as follows: 


2, Age of the teacher. 
25 leaching experience—in years. 
2« Professional training in years above high 
school. 
21 Size of school—number of pupils. 
2s Size of class—number of pupils. 
» Per-Pupil cost per year. 
T;, Salary of teacher per month. 
C, Criterion Score based upon pupil change. 


Table A,, Appendix A,** reports the raw 
scores for the thirty measures listed, the cri- 
terion scores for all teachers, and the means 
and standard deviations for each for the group 
of 47 teachers participating in the investiga- 
tion. The correlations of the teacher measures 
with the criterion are given in Tables XVIII 
and XIX. The intercorrelations among the 
several teacher measures are reported in 
Table XX. 


%5 See original thesis on file, Library, University of Wis 
consin. 





TaBLe XVIII 


IprERCOREEEASSONS OF TEACHER MEASURES 


With CRITERION 
N = 47 


American Council Psychological Test ..—. 10 


chers College chological Test - - . 05 
— lees Ss ivics and Govern- 


Hartmann Information Test 

Bernreuter Personality Inventory Bn. .—. ‘14 
Bernreuter Personality Inventory Bd.. .04 
Bernreuter Personality Inventory Bs _..—.11 
Morris Trait Index L . 
oo Social Adjustment In- 
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C. Personality: 


1. Bernreuter Personality Inventory, 
Bn. 

. Bernreuter Personality Inventory, 
Bd 


. Bernreuter Personality Inventory, 
Bs. 

. Morris Trait Index “L”’. 

. Washburne Social Adjustment In- 
ventory. 

. A Scale for Evaluating the Personal 
Fitness of Teachers. 


Seale “ ersonal Fitness of Teachers... 7. A Personality Rating Scale. 


P ality Rating Scale—U. of Wis. - - ‘ : 
Torgerson Diagnostic Rating Scale - D. Rating Scales for Teachers: : 

— Rating Scale for Teach- 1. Torgerson Diagnostic Rating Scale 
° of Instructional Activities. 


. Almy-Sorenson Rating Scale for 
Teachers. 


. Michigan Rating Scale. 


Michigan Rating Scale 

Stanford Aptitudes Test T-A 

Stanford Aptitudes Test A-R 

Stanford Aptitudes Test T-R 

Hartmann Social Attitudes Test 

Wrightstone Scale of Civic Beliefs 

Orientation Test, 1935 Edition 

Yeager Scale for Measuring Attitudes 
Toward Teachers and Teaching 

Torgerson Teacher-Pupil Relationship 

Sims Socio-Economic Status 


TABLE XIX 


INTERCORRELATIONS OF TEACHER MEASURES 
WITH CRITERION, ARRANGED IN 
ORDER OF SIZE 


rson Teacher a Eating Scale 

chigan Ratin, 

‘ Sevlnene Socia Aetieadin Test 

Almy-Sorenson Teacher Rating Scale__- 

Test of Personal Fitness—Charters 

Size of School—Number of Pupils 

. Personality Test—U. of Wis 

. Wrightstone Scale of Civic Beliefs 

. Salary of Teacher—per month 

. Torgerson Teacher-Pupil Relationship -- 

. Yeager Scale for Measuring Attitudes 
Toward Teachers and the Teaching 
Profession 

. Experience of the Teacher—in Years_--_- 

. Size of the Class—Number of pupils 

. Stanford Aptitudes Test T-A 

. Washburne Social Adjustment Inventory 

- Teachers College Psychological Exam- 


Additional Data 


Age of the Teacher 
Teaching Experience in Years 
Professional Training above High 


Per Pupil Cost Per Year- 
Salary of the Teacher Per “Month 


1. 
2. M 
3 
4. 
5. 
6. 
7 
8 
9 
0 
1 


The correlations between the various 
teacher measures and the criterion may be 
more easily interpreted when grouped under 
the following categories: 


A. Intelligence: 


1. American Council on Education Psy- 
chological Examination. 


2. Psychological Examination Prepared 
for the Teachers College Personnel 
Association. 

B. Knowledge of Subject Matter: 

1. American Council Civics and Gov- 

ernment Test. 


2. Hartmann — Information Test on 
Public Problems. 


. Bernreuter Personality Inventory Bd - -- 
. Age of the Teacher 
. Hartmann Information Test 


. Per api Cost —. 06 
. Training of Teacher Above High School _—. 09 
- American Council Psychological Exam- 


. Bernreuter Personality 1% wad i. .— 1h 
. Stanford Aptitudes Test T: 

. Bernreuter Personality Inventory Bn__.—. 14 
. Stanford Aptitudes Test A .15 
. Sims Socio-Economic Status 

. Morris Trait Index L 
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TaBLe XX 
INTERCORRELATIONS OF TEACHER TEST SCORES 
N= 47 


Test Number 1 2 


American Council Psychological Exam. 1. 1. 000 . 727 
Teachers College Psychological 2. . 727 1. - 
American Coun. Civics k j 
Hartmann Information ‘ . 123 “334 
Bernreuter Bn ' _—, —.094 
—. . 067 
. 249 
. 135 
3 ‘ . 283 
Personal Fitness \ d . 087 
Personality ‘ ° . . 002 
Torgerson Ratin Scale q . . 065 
Almy-Sorenson Rating----------.----- . P . 126 
Michigan Rating : d . 155 
Stanford T-A ‘ ‘ —. 048 
es, tbe eaecetaness ‘ é —. 052 
Stanford T-R : , . 008 —. 093 
Hartmann Attitudes ‘ . 329 . 481 
Wrightstone Civic Beliefs ’ . 285 . 307 
Orientation Test ’ . 703 . 442 
Yeager am . 094 —.074 
Torgerson Teacher-Pupil SD. » . 163 .211 
Sims Socio-Economic 23. . 181 . 228 
eee ot Gee Teneeel...............-- : . . 444 . 434 
Experience of Teacher 3 . 430 . 432 
Training Above H.S - , . 323 . 489 
i ; . 065 . 274 
. . 005 . 069 
Per Papi Cost , . 043 —.117 
Salary of Teacher ‘ . 167 . 284 
Sie ES ae a arr : —. 095 . 050 


TABLE XX—Continued 
9 10 


. 486 ‘ . 049 

. 135 : . 087 
. 086 
. 148 
- 110 
. 089 
. 030 
. 057 
. 457 
. 000 
. 901 
. 839 
. 891 
. 855 
. 128 
. 044 
. 037 
. 374 
. 193 
. 087 
. 030 
. 021 
. 088 
. 244 
. 259 
. 129 
. 203 
. 076 
. 126 
. 231 
. 352 
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TABLE XX—Continued 


17 18 19 
—. 008 . 829 . 285 
. —093 . 481 . 807 

. 024 . 234 - 295 
—. 212 -811 
—. 104 . 285 
. 271 . 251 
—. 054 . 234 

-261 . .090 

‘ . 319 

. 874 


— AS 


: 438 


le on a Ci a ea aa 


—. 005 
—. 007 

. 041 
—. 032 
—. 016 
—.014 
—. 128 


TABLE XX—Continued 


26 27 28 


. 323 . 065 —. 005 
. 489 . 274 . 069 
. 099 - 219 . 197 
. 215 . 309 271 
. 149 —.314 —. 300 
. 168 - 485 - 480 
. 232 - 207 . 232 
—. 120 . 054 —. 026 
. 064 . 103 .179 
. 129 - 203 . 076 
. 065 . 164 . 055 
171 . 298 . 194 
. 135 . 269 . 125 
. 169 . 335 . 246 
. 033 . . 034 
, . 009 
. 032 

. 044 

-191 

. 128 

. 210 

. 060 

.172 

. 396 

- 432 

. 270 

. 813 

. 000 

- 600 

. 496 
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E. Aptitudes and Attitudes: 
1. Stamord Educational Aptitudes Test, 
T-A. 
2. Stanford Educational Aptitudes Test, 
A-R. 

. Stanford Educational Aptitudes Test, 
T-R. 

. Hartmann — Social Attitudes of 
Teachers. 

. Wrightstone Scale of Civic Beliefs. 

. Lewerenz—Steinmetz — Orientation 
Test, 1935 Revision. 

. Yeager Scale for Measuring Attitude 
Toward Teachers and the Teaching 
Profession. 

8. Torgerson et al—aA Test of Teacher- 
Pupil Relationship. 
F. Socio-Economic Status: 


1. Sims Score Card for Socio-Economic 
Status, Form C. 


A study of the criterion correlations shows 
that only seven are statistically significant. Of 
these seven measures, five are rating scales. 
The other two measures are: Hartmann Social 
Attitudes Test, .384; and size of school 
(number of pupils), .312. The Wrightstone 
Scale of Civic Beliefs, .285; salary of teach- 
ers, .223; Torgerson Teacher—Pupil Relation- 
ship Tests, .222; and Yeager Scale for Meas- 
uring Attitudes Toward Teachers and the 
Teaching Profession, .221 approach signifi- 
cance. Taken separately none of these meas- 
ures have any very great validity when 
checked against the criterion here employed. 
In general the correlations reported for this 
study are lower than those reported by 
Rostker’s'*. This may arise in part from the 

% Leon E. Rostker, Measurement and Prediction of Teach- 
ing Ability, unpublished Doctor’s Dissertation, 1939; on file, 

, University of Wisconsin, Madison, Wisconsin. 
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fact that a larger amount of the pupil gain 
was attributable to the teachers and other un. 
controlled factors in Rostker’s study. Forty. 
four per cent of the pupil gain variance is 
attributable to the teacher factor in Rostker’s 
study and only 24% to the teacher factor in 
Rolfe’s study; 27% of the variance is attri. 
butable to pupil factors in Rostker’s study 
and 48% in Rolfe’s study. Thirty per cent of 
the variance is attributable to errors of meas. 
urement in Rostker’s study and 29% in 
Rolfe’s study. It will be recalled that Rolfe’s 
teachers were principally teachers in one-room 
rural schools; those in Rostker’s study were 
in part from two-room state graded schools 
and presumably better teachers. 


SECTION V 


A STUDY OF THE VALIDITY OF COM. 
BINATIONS OF SEVERAL TESTS 


The correlations between teacher measures 
and the criterion have already been presented 
in Tables XVIII and XIX. The purpose of 
this section of this report is to present data 
relative to the predictive value of certain 
combinations of tests. The first combination 
to be studied is presented in Table XXI. 

By using Aitken’s Method of Pivotal Con- 
densation*’ the beta coefficients necessary for 
computing the multiple correlations and mul- 
tiple regression equations were obtained. 
Table XXII shows the results of these com- 
putations based upon the nine teacher 
measures selected for study. 

The multiple correlations and their signifi- 
cance are indicated in Table XXIII. These 
data were obtained from the proper multipli- 


1 Godfrey H. Tho . The Factorial Analysis of Huma 
row (New York: ton Mifflin Company, 1939), pp. 


TABLE XXI 
RELATION BETWEEN CERTAIN MEASURES APPLIED TO TEACHERS AND TEACHING EFFICIENCY 
N = 47 


Torgerson— Diagnostic Teacher Rating Scale 
Hartmann—Social Attitudes of Teachers 
Wrightstone—Scale of Civic Beliefs 
Torgerson—Teacher-Pupil Relationship 


Yeager—Scale for Measuring Attitudes Toward Teachers and the Teaching Profession - - 


Morris—Trait Index L 


y Poor eee Bi nee: | Inventory Bn 


American Council Psychological Examination 
American Council Civics and Government Test 
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TaBLe XXII 
Beta COEFFICIENTS FOR PREDICTION OF TEACHING SUCCESS 


MULTIPLE CORRELATIONS AND THEIR SIGNIFICANCE 


R? 45—P 

_— P 
425. ; 44. 000 
498. ‘ 21. 500 
526. 7 14. 000 
.533 : 10. 250 
. ‘ 8. 000 
698 . ‘ 6. 500 
603. ‘ 5. 429 
, se ‘ 4.625 
a . ‘ 4.000 


y 

o 
| 
~ 
| 


COA Che 

PO PO PO POPO RO NO com 

Kd wo, 

BO 22 G0 G0 G0. Cm on 
_ one 


wo 
rs 


*Significant at 5% level; 5 chances in 100 from an uncorrelated random pop. 
**Significant at 1% level; 1 chance in 100 from an uncorrelated random pop. 


cations of the correlations and beta coeffi- 
cients in Table XXII. 

Two important facts stand out from Table 
XXIII: (1) With the addition of more vari- 
ables the multiple correlation increases in size; 
and (2) as more variables are added the error 
increases. The increase in the multiple R’s 
and the errors move along fairly parallel 
courses. When the oth variable is added the 
R is no longer significant at the 1% level. 

From the relative contribution each inde- 
pendent variable makes to the next multiple 
correlation, it is possible to rank these vari- 
ables according to their potency. Therefore, 
in Table XXIII, the second column ranks 
these independent variables according to their 
contributions as follows: 


1. X, Torgerson Teacher Rating Scale. 

2. X, Hartmann Social Attitudes of 
Teachers. 

3. X, Yeager Scale for Measuring Atti- 
tudes Toward Teachers and the 
Teaching Profession. 

4. X, Wrightstone Scale of Civic Beliefs. 

5. X, American Council Psychological Ex- 


6. X, Morris Trait Index L—Leadership. 
7. X, Torgerson Teacher—Pupil Relation- 
ship 


8. X, Bernreuter Personality Inventory, 
Bn. 

9. X, American Council Civics and Gov- 
ernment Test. 


It is understood that contributions to the 
multiple correlations will be in relation to 
those contributions made by previous tests. 
The reason for such a small increase in the 
multiple correlation for any particular meas- 
ure will be in terms of the contributions 
already made. Tests which contribute similar 
content may be discovered and use made of 
those items which add to the test results. 
Tests number 4, 7, and 9 appear to contribute 
little to the multiple correlation of .627 in 
the list above. 

Since the beta coefficients tell us the rela- 
tive importance of the contributions made by 
the independent variables in multiple correla- 
tion equations, it is possible to determine the 
meaning of each beta coefficient reported in 
the second column of Table XXII. 
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With the use of the beta coefficients ob- 
tained when the multiple correlations were 
determined it is now possible to set up a re- 
gression equation based on the criterion of 
pupil change. When the criterion is expressed 
in standard score units as previously indi- 
cated, it is possible to express the regression 
equation in terms of raw scores on the inde- 
pendent variables—teacher measures—and 
the criterion—dependent variable—in terms 
of standard scores. The general form of this 
regression equation can then be written as:** 

X. = Bey SS; + Bes 
a 


o. X 


By substituting the proper beta coefficients, 
means, and sigmas in the above formula the 
following multiple regression equation results: 

Ze, x1...x9 == .032X, + .021X, + .006X, 
+ 008X, + .496X, — .006X, — .oo1X, — 
Pre atte .002X, — 6.413. 


This equation is tested by substituting for 
the unknowns the appropriate mean scores 
obtained from the group of 47 teachers. The 
prediction of zero confirms the fact that the 
predicted score for the average teacher would, 





2 
+(4 — Bex 2% — Bes 
oO; 
OR 


if the mean of the criterion or dependent 
variable equals zero and has a standard devi- 
ation equal to one, this formula may be 


expressed as: 
xX.= = fax, “= esx 


bay, By, win 
a; 


™% The second second formula was wsed in establishing the reer 
sion equation by which predictions were made in this investi- 


on the basis of standard scores, be equal to 
zero. 

A comparison of the predicted score of 
teaching ability with the actual score ob- 
tained will again test the accuracy of the 
regression equation for the group from which 
it was derived. With a multiple regression 
coefficient of .627 and a standard error of 
estimate of .779, it is possible to test the 
accuracy of the equation as a predicting in- 
strument by substituting the actual scores of 
a teacher obtained on the various tests. 


TABLE XXIV 
ILLUSTRATION OF THE USE OF THE REGRESSION EQUATION 
Teacher 


11 12 


66 54 
58 
109 
287 
3.2 
53 
216 
253 
65 


—.16 


scores.... —1.68 .91 . 03 


Key for tests listed in Table XXIV 
Torgerson Teacher Rating Scale. 
Hartmann Social Attitudes of Teachers. 
Wrightstone Scale of Civic Beliefs. 
Youser Seale for Messuring Atticudes. 
‘or Measuring Attitudes 
Morris Trait Index L 
Bernreuter Personality Inventory Bn 
American Council Psychological Examination. 
American Council Civics and Government Test. 


23 
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Scores of teachers selected at random are 
substituted below to illustrate the use of this 
regression equation as a predicting instrument 
of teaching ability: S. Dost. == .779 using 
the formula by Guilford.** One would not ex- 

t the same degree of accuracy for a new 
group of teachers. 

The second part of this section is concerned 
with the use of other teacher measures 
arranged in a different order than that used 
in the first prediction of teaching ability. 
Here the ten teacher tests having the highest 
absolute correlation with the criterion of 
teaching ability based on pupil change were 
used. This list was secured by eliminating 
the American Council Civics and Government 
Test and by adding the test of Personal Fit- 
ness and also Sims Score Card for Socio- 
Economic Status. The following teacher 
measures were selected: 


Torgerson Teacher Rating Scale 
Hartmann Social Attitudes of Teachers 


There appears to be considerable evidence 
supporting the significance of the multiple 
correlations obtained from both of the series 
of tests used in this study for the purpose of 
predicting teaching ability. 

When the beta coefficients, means, and 
sigmas are substituted in the prediction 
formula and the proper multiplications and 
divisions are carried out, the regression equa- 
tion becomes: 

Zo, x1..219 == -038X, + .018X, — .042X, 
+ .0o9X, + .007X, + .449X, — .o11X, 
— .031X, — .oo1X, — .003X,, — 5.690. 

Test scores of teachers selected at random 
are substituted below to illustrate the use of 
this regression equation as a predicting in- 
strument of teaching ability. S. Dis, = 
.768 based on the Multiple R of .640 (Table 


Personal Fitness Rating Scale—University of Wisconsin 


Wrightstone Scale of Civic Beliefs 
Torgerson Teacher-Pupil Relationship 


Yeager Scale for Measuring Attitudes Toward Teachers 


Sims Score Card for Socio-Economic Status 
Bernreuter Personality Inventory Bn 


; American Council Psychological Examination ------------_--- Pe Ra A lA OEE Se 


With the use of Aitken’s Method of Pivotal 
Condensation as in the first part of this sec- 
tion, another set of beta coefficients necessary 
for computing multiple correlations and mul- 
tiple regression equations was obtained. 
Table XXV shows the beta coefficients from 
which the multiple correlations have been 
obtained. The multiple correlations are re- 
ported in Table XXVI. 


Data in Table XXVI have been obtained 
by the proper multiplication and addition of 
the beta coefficients and the correlation with 
the criterion indicated in Table XXV. The 
same observation should be made here as 
earlier that the error increases with the addi- 
tion of new variables with the addition of 
variables g and 10 the multiple R ceases to 
be significant at the 1% level. 


J. P. Guilford, Psychometric Methods (New York: 
Metoaw Hill Book Company, Inc., 1936), p. 385. 
S. D.oet., =o Criterion VI — R® 


Considerable consistency appears to be 
found between obtained scores and predicted 
scores reported in Tables XXIV and XXVII. 
Thus it seems that the equations here used 
in predicting teaching ability in the field of 
the Social Studies may have some justifica- 
tion. With the addition of rating scales and a 
different combination of teacher measures 
than that used in this study it may be pos- 
sible to obtain reliable measures of teaching 
efficiency. 

Other regression equations may be formed 
from data represented in this study, but the 
above equations would seem to indicate that 
many measures each contributing only mod- 
erately to teaching ability, may be success- 
fully combined to form more valid measures 
of teaching ability. It is entirely possible that 
while simple measures may contribute only - 
slightly to the measurement of teaching effi- 
ciency, items of such tests may be combined 
into statistically significant instruments. 





Rex. 
. 181 
. 248 
. 252 
. 282 
. 287 
. 334 
. 368 
. 396 
. 404 
. 410 
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Ro,1.2 
- 425 
. 498 
. 502 
. 531 
. 536 
. 578 
- 606 
- 629 
. 636 
. 640 


R? 


TABLE XXVI 
MULTIPLE CORRELATIONS FOR VARIOUS COMBINATIONS OF TESTS AND THEIR SIGNIFICANCE 


1—R’ 1—R* 


. 221 
. 330 
. 337 
. 393 
- 403 
. 502 
. 582 
. 656 
. 678 
. 693 
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*Significant at 5% level; 5 chances in 100 from an uncorrelated random population. 
**Significant at 1% level; 1 chance in 100 from an uncorrelated random population. 


4 


11 


TABLE XXVII 
ILLUSTRATION OF THE USE OF THE REGRESSION EQUATION 


9 


9 


12 
54 
56 
3 
86 
251 
2 
52 
18 
143 


—.91 
—. 89 


Teacher 
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1% 


7.24 
5.13 
4.29 
3.81 


Sig. 
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SECTION VI 
SUMMARY AND CONCLUSIONS 
The main objective of this study was to 
e the validity of certain measures of 


teaching ability as correlated with pupil 
change as 4 criterion. Intercorrelations were 


calculated between pupil-change and each of 

the teacher measures and teams of such 

measures. From these data it was possible to 

determine those measures which seemed to 

most value as measures of teaching 

ability. The measures possessing the highest 

validity were: 

. Torgerson Teacher Rating Scale. 

. Michigan Rating Scale. 

Hartmann Social Attitudes. 

Almy-—Sorenson Teacher Rating Scale. 

. A Personal Fitness Rating Scale. 

. Size of school—number of pupils. 

A Personality Rating Scale. 

. Wrightstone Scale of Civic Beliefs. 

. Salary of the teacher. 

. Torgerson Teacher—Pupil Relationships. 

. Yeager Scale for Measuring Attitudes 
Toward Teachers and the Teaching 
Profession. 


o~o om un & ww WD 


“— 
a 


By combining the following measures into 
a multiple regression equation a multiple 
correlation of .627 was obtained, with a 
S. Dest. Of .779: 


. Torgerson Teacher Rating Scale. 

. Hartmann Social Attitudes of Teachers. 

. Wrightstone Scale of Civic Beliefs. 

. Torgerson Teacher—Pupil Relationship. 

. Yeager Scale for Measuring Attitudes 
Towards Teachers and the Teaching 
Profession. 

. Morris Trait Index L. 

. Bernreuter Personality Inventory, Bn. 

- American Council Psychological Exam- 
ination. 

. American Council Civics and Govern- 
ment Test. 


A second group of teacher measures gave 
a somewhat higher multiple correlation. This 
second set of tests was selected on the basis 
of their correlation with the criterion and 


have been arranged in the order of size. 
These measures include a large variety of 
qualities and abilities, some of which appear 
to show a definite relationship with the cri- 
terion of teaching ability used in this study. 
A list of the ten teacher measures making up 
the group is given below: 


. Torgerson Teacher Rating Scale. 

. Hartmann Social Attitudes of Teachers. 

. Personal Fitness Rating Scale. 

. Wrightstone Scale of Civic Beliefs. 

. Torgerson Teacher—Pupil Relationship. 

. Yeager Scale for Measuring Attitudes 
Toward Teachers and the Teaching 
Profession. 

. Morris Trait Index L. 

. Sims Score Card for Socio-Economic 
Status—Form C. 

. Bernreuter Personality Inventory, Bn. 

. American Council Psychological Exam- 
ination. 


This combination gave a multiple correla- 
tion of .640 with a standard error of estimate 
of .768. It is also interesting to note that the 
correlation of the obtained scores of teaching 
ability (the criterion scores) correlate with 
the predicted scores of the 47 teachers to the 
same figure as the multiple correlation of 
.640. 


Other interesting and valuable combina- 
tions of these teacher measures are possible 
and some of them may prove valuable in pre- 
dicting teaching success as well as assisting 
in the better training of teachers in service. 

Although the data in this study are at 
times conflicting, certain conclusions relative 
to the evaluation of teaching efficiency seem 
warranted: 


1. Personality as here defined and meas- 
ured seems to posses a positive rela- 
spines good teaching (r = .35; 
7 = .30). 

. Rating scales when used by experienced 
and competent supervisors for the 
purpose of evaluating teacher efficiency 
give a positive correlation (r == .36 to 
r= 43). 

. Social attitudes as measured by the 
Hartmann and Wrightstone Scales 
appear to be related to teaching effi- 
ciency (7 == .38; r = .29). 
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. Size of the school appears to possess 
significance in evaluating teaching 
efficiency as here measured (r == .31). 
. Teacher-pupil relationship as meas- 
ured by the Torgerson Teacher—Pupil 
Relationship Test is positively corre- 
lated with teaching efficiency but not 
in an amount that is statistically sig- 
nificant (r == .22). 

. Attitudes toward teachers and the 
teaching profession as measured by the 
Yeager Scale is positively correlated 
with teaching efficiency but not in an 
amount that is statistically significant 
(7 = .22). 

. The Bernreuter Personality Inventory, 
Bn—neurotic tendencies—shows a 
small negative relationship (r — 
—.14). 

. Dominance as measured by the Bern- 
reuter Scale does not appear to con- 
tribute to teaching efficiency (r = 
.04). 


[Vol. 14, No. 


9. Social adjustment as measured by the 


Washburne Scale seems not to be re- 
lated to teaching efficiency (r == .06), 


. Intelligence as measured by the Amer. 


ican Council Psychological examination 
seems not to be related to teaching 
efficiency (r == —.10). 


. The age and experience of the teacher 


contributes little when measured 
against the criterion of pupil change 
as set up in this study (r = .o1; r = 
10). 


. Leadership as measured by the Morris 


Trait Index L is negatively correlated 
with teaching efficiency (r — —.17). 


. There appears to be considerable evi- 


dence that the teacher in these rural 
schools contributes less to pupil suc- 
cess than do teachers in the school 
where there is a single grade to be 
taught. This fact may throw some 
light upon the inconsistencies between 
this study and that reported by 
Rostker. 
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THE MEASUREMENT OF TEACHING ABILITY 
STUDY NUMBER THREE 
C. V. LADUKE 


SECTION I 
THE PROBLEM 


This investigation was undertaken as a fur- 
ther check on the results of two eaxlier studies 
of teaching ability. The results of the two 
earlier studies seemed to be in essential agree- 
ment in most res ; but further research 
seemed desirable. purpose of this particular 
study was to study the validity of a selected 
battery of tests shown by earlier investigations 
to have particular promise. 

The results of the study should assist in 
answering the following questions: 


1. What relationships do certain teacher 
factors, like intelligence, have to an 
objective criterion of teaching efficiency? 

2. How do supervisory ratings agree with an 
objective criterion of teaching efficiency? 

3. What validity may certain selected teacher 
measures have when validated against an 
objective criterion of pupil change? 


SECTION II 


EXPERIMENTAL DESIGN AND 
PROCEDURE 


The design of this study was controlled to 
a large extent by the criterion of teaching abil- 
ity which the investigator decided to use. The 
criterion was that of pupil change. The investi- 
gation consisted, therefore, chiefly of studying 
the relationship between pupil change and a 
number of teacher qualities under partially 
controlled conditions. 

The investigation was a part of a quite elabo- 
rate study of radio in education,’* undertaken 
during the years 1937-39 under the supervision 
of the Department of Education and Speech of 
the University of Wisconsin. It was possible by 
careful planning to set up a group of teachers 
ben? “Ports by L. B. Resther and J. F. Rolfe reproduced 


. the U ity ot Wisconsin as, < in 

niversit: esearc in 

et Broadcasting. (Madison, Wisconsin : Univectte: of 
Wisconsin Press, 1942), 203 pp. 


in the social science area in such a fashion that 
it would not only meet the purposes of the 
radio study but would also serve the purposes 
of this study. The study had the advantage of 
being quite adequately financed, making it pos- 
sible to provide exceptionally good supervision 
of the project and collection of data. 


SELECTION OF THE SCHOOLS 


Since teachers were to be evaluated in terms 
of changes produced in their pupils, a depart- 
mentalized system where pupils had more than 
one teacher would complicate the problem, so 
it was decided to fo the study to one- 
teacher rural schools. It was quite essential that 
the schools be located near the seat of opera- 
tions to reduce travel and expense; so the 
names of all the rural schools having no radios 
in seven Wisconsin counties near Madison were 
put in a box and after being a os 9 mixed, 
the names of the required number of schools 
were drawn at random. Extras were drawn to 
offset those who might not cooperate. Upon 
contacting the teachers of these schools, forty- 
one were glad to take part in the study and 
were enrolled. Later, seven of these schools 
were d due to withdrawal of pupils or 
absence of pupils from tests, so complete data 
was pron ibe only 34 7th- and 8th-grade 
classes enrolling a total of 200 pupils. The 
enrollment in each school, the enrollment in 
the participating class, the assessed valuation 
of each district, and the annual district expendi- 
ture are shown in Table I. Inspection of this 
table reveals a condition of heterogeneity in all 
of the factors listed that might be quite disturb- 
ing when one considers that the teachers are 
to be measured by pupil change, and that pupil 
change might be determined in part by these 
factors. The disparagement in the district 
assessed valuation would ordinarily be inter- 
preted as differences in ability to support 
schools. The state aid law in Wisconsin, how- 
ever, is so drawn and administered that the 
school tax rate is practically no larger in schools 
of low valuation than in schools of large valua- 
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TABLE I 
INFORMATION RELATIVE TO SCHOOLS, CLASSES, AND FINANCIAL SUPPORT 


icipati 
~~“ 
Enrollment 


School 


tion, even though equal amounts are spent for 
school support.” 

Class size varies from the smallest class of 
three members to the largest of nine, and while 
quite different relatively, are still in the small 
class category. Experimental evidence, so far as 
could be determined as to size of class and 
teaching efficiency, shows no advantage to 
classes of either extreme. 

In so far as could be determined, the same 
line of reasoning applies to the effect of the 
size of the school. ile enrollment varied 
from thirteen to thirty-seven, no evidence could 
be found indicating that either extreme would 
have any particular advantage. 

Socio-economic factors that may be operative 
in the schools employed in this investigation 
are depicted in Table II. Review of these data 
reveals great uniformity from school to school. 
Occupationally the various districts are much 


Cc. , “Our Equalization Law,” Wisconsin Journal of 
Réwsation Chtsrn 1933), pp. 313-314. 


Total District 
E diture 
1937-1938 
$1490 

970 


Assessed 
Valuation of 
District 
$250 ,000 
119 ,000 
153 ,000 
166 ,000 
415 ,000 
260 ,000 
212,000 
205 ,000 
283 ,000 
414 ,000 
164,000 
259 ,000 
282 ,000 


271 000 


4 
4 
7 
7 
5 
5 
6 
7 
9 
7 
5 
6 
7 
8 
6 
7 
6 
3 
8 
5 
5 
5 
5 
7 
5 
7 
4 
5 
8 
6 
5 
4 
5 
7 


200 |000 


alike, being predominantly agricultural; eco- 
nomic independence in all districts is indicated 
by the very few on relief; nationality (extrac- 
tion) varies from community to community, 
but since the percentage of foreign born is so 
small, it is doubtful if the extraction factor 
should be given much weight. In other words, 
they are all Americans. 


THE PARTICIPATING TEACHERS 


The random selection of schools described 
above resulted in the selection of the teachers 
as well, since the schools are all one-teacher 
schools. Table III summarizes information col- 
lected with reference to the teachers. In age the 
teachers varied from 21 to 44; in salary from 
$80 to $115 a month; in experience from one 
to 16 years; in tenure of present position from 
none to five years; in certification from county 
third to state unlimited certificate; and 
in professional training for teaching, from two 
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TABLE II 
Data RELATING TO SOCIO-ECONOMIC FACTORS 


Occupation 
Percent Percent 
Agr. 


J 


COoONeYanoooouamocoooonrwooaaoon 
amoorocoocoooooo 


— 


ar) 
SHASCABWOMOROCON 


tw 9 


0 
6 
0 
0 


* Data for school number 18 not available. 


semesters in county training school to eight 
semesters in teachers college and university. Of 
the thirty-four teachers, four were men. 


The typical teacher then was a twenty-four 
year old woman, with one year of professional 
training in a county normal school, holding a 
first-grade county teacher's certificate, having 
three years of teaching experience previous to 
the year of the investigation, having taught one 
year in the present position, and receiving a 
salary of ninety dollars a month or $810 a year. 


PREPARATION OF COURSE OF STUDY 


The outline of the course of study for Com- 
munity Living, composed of eight units, was 
prepared by members of the State Department 
of Public Instruction, a committee of teachers 
of the social studies who volunteered to assist 
in the planning of the series, and members of 
the radio project research staff. The attempt was 


Percent Relief Predominating 
Other or W.P.A. 


Percent Foreign Born 


Nationality in Community 


5% (Polish) 
20% (German) 


made to pay the progressive development 
of the political and social organization of our 
democratic society. The child's responsibility in 
the functioning of democratic society was to be 
emphasized. 

The following outline and schedule was 
developed: 


Unit I. Your Family, Home, and Community 


Sept. 26. You and your family 
Oct. 3. You and your home 
Oct. 10. Your home and your community 


Unit Il. How the Community Serves You 


Oct. 17. Safe highways 

Oct. 24. Protection of life and property 

Oct. 31. Education—your opportunity 

Nov. 7. Recreation—a new community service 
Nov. 14. Health—a community problem 
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TABLE III 
Data RELATIVE TO PARTICIPATING TEACHERS 


APAIAMH OCwWoOoO te 
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85 
105 
95 
115 
110 


_ 
AIO & OID DNS nw on co -3 


i 
= OO 


F 
F 
F 
F 
F 
F 
F 
F 
F 
F 
F 
F 
F 
F 
F 
M 
M 
F 
F 
F 
M 
F 
F 
F 
F 
F 
F 
F 
F 
F 
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DON Doe Com DO: 
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Training beyond H. S. 
in semesters 


County Teacher Other 


License 
Normal College 


Held 
County (1) 
County (1) 
Rural 


ae | (1) 


County 


County 
County 
County 
County 
County 
County 
State 
State 
County 
County 
County 
County 
County 
County 
County 
County 
County 
County 
Coun 
State (life) 
County 
State (life) 
County 
County 


* Schools number 28, 31, and 39 were omitted to secure homogeneity. 


Unit Ill. How the State Serves You 


Nov. 21. Conservation of natural resources 

Nov. 28. State protection for producer and 
consumer 

Dec. 5. Services of the state university 

Dec. 12. The Wisconsin Social Security Law 


Unit IV. How the National Government 
Serves You 


Jan. 2. Uncle Sam carries the mail 
Jan. 9. Government research serves your home 
Jan. 16. Government regulation—labor, com- 
‘ munication, etc. 

Jan. 23. Uncle Sam cares for the unemployed 


Unit V. How Your Government is Organized 
and Supported 


Feb. 6. Managing your local government 
Feb. 13. Managing your state government 
Feb. 20. Managing your national government 
Feb. 27. Paying the bill together—taxation 


Unit VI. Making a Living in Your Community 

March 6. Making a living in the country— 
agriculture 

March 13. Making a living in the city—Wis- 
consin industries 

March 20. Workers’ problems in town and 
country 

March 27. Buying and selling together—co- 
operatives 


Unit VII. Social and political Groups 
April 3. Nationality groups in Wisconsin 
April 10. Social life in your community 
April 17. Political parties 
April 24. Your part in democracy 


Unit VIII. Your Community and World 
Society 


May 1. Your state and world markets 


May 8. Your country and world problems 
May 15. Your part in the world community 
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The objectives of the course were formulated 
by members of the research staff who attended 
the Progressive Education Workshop held in 
Bronxville, New York, during June and July, 
1938. Leaders in the field of social studies who 
were in attendance at the conference assisted 
in a large measure in the formation of these 
objectives. 

The following specific objectives for Unit 
Ill, How the State es You and Your Com- 
munity, are illustrative of the objectives devel- 


oped:* 
SPECIFIC OBJECTIVES 


A. Functional Information: 


(1) To develop an understanding of how 
the state serves its citizens and their 
many interests and needs. 

(2) To indicate the ways in which the 
services of the state are related to the 
services of local government. 

(3) To indicate the community needs which 
make state services necessary. 

(4) To indicate the manner in which the 
state provides for the conservation of its 
natural resources. 

(5) To indicate the ways in which the state 
attempts to protect the interests of the 
producer and consumer. 

(6) To indicate some of the services pro- 
vided by the state university for the 
citizens of the state. 

(7) To indicate the manner in which the 
state provides for social welfare—social 
security. 

(8) To indicate the deficiencies in the serv- 
ice of the state. 


B. Interests: 


(1) To develop an interest in the functions 
of the state government. 

(2) To develop an interest in the ways the 
state protects our natural resources. 

(3) To become interested in the ways in 
which the state protects the producer 
and the consumer. 

(4) To become interested in the services 
provided by the state university. 

(5) To become interested in the problem of 
providing for the social welfare of its 
citizens. 

* For complete list of objectives, see Teachers’ Manual, 


D, original thesis on file in the University Library, 
University of Wisconsin. 


C. Appreciations: 


(1) To recognize the value of the services 
of the state to the individual and to the 
community. 

(2) To recognize the importance of con- 
serving our natural resources. 

(3) To develop a recognition of the state's 
responsibility for social welfare. 

(4) To recognize the importance of the 
services provided by the state university. 

(5) To develop an appreciation of the inter- 
dependence of the producer and the 
consumer. 

(6) To develop an appreciation of the indi- 
vidual contributions to the growth of 
state services. 


D. Attitudes: 


(1) To develop an attitude of cooperation 
toward state services. 

(2) To develop an attitude of concern with 
regard to state activities. 

(3) To develop a sense of responsibility in 
improving the services of the state. 

(4) To promote an attitude of concern re- 
garding the conservation of natural re- 
sources. 

(5) To develop an attitude of consideration 
of the rights and needs of the different 
interest groups within the state in regard 
to governmental services. 


COLLECTION OF Pupit DATA 


Based upon objectives such as those illus- 
trated above, objective tests measuring appre- 
ciations, attitudes, information, and interests* 
were constructed. 

These tests, arranged in two test booklets, 
were administered to all the pupils in the study 
by field investigators during the week preced- 
ing October 17, 1938. The purpose of these 
tests was to secure a measure of the status of 
the pupils at the beginning of the rimental 
period, during which the course in unity 
Living was presented. One thirty-five minute 
class period per day was devoted to the study 
of Community Living. Each teacher was free 
to present the material by any method she 
might choose, but not more than one class 
period a week was to be devoted to any one 
topic. 


discussion construction, in original 
University Library, University of Wisconsin. 
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During the week following April 24, 1939, 
field investi again ini the test 
that had been used at the beginning of the 
teaching period. Thus a measure of the pupil 
status was secured at the end of the period of 
instruction. Pupil change during the interven- 
ing six month period was determined by sub- 
tracting the scores on the pre-test from those 
on the final test. 


Shortly after the pre-test was administered, 
field workers gave the pupils the Kuhlmann— 
Anderson Intelligence Test from which each 
pupil’s M.A. and 1.Q. was determined. 


COLLECTION OF TEACHER DATA 


To measure teacher qualities the teachers 
were assembled at central meeting — » usu- 
ally the county seat, and tests were administered 
by field workers. A measure of the teacher's 
intelligence was secured by administering the 
American Council Psychological Examination.* 
The teacher’s knowledge of mental hygiene was 
secured by administering Torgerson’s Theory 
and Practice of Mental Hygiene test. Yeager’s 
Scale for Measuring Attitude toward Teachers 
and Teaching was administered to secure a 
measure of the teacher's interest in her work. 


Besides these measures which were secured 
under controlled conditions, two measures were 
sent to the teachers by mail: Harnly’s State- 
ments about Education, which purports to give 
a liberal-conservative position measure, and 
Jackson's Social Proficiency Test, which aims to 
measure “consideration for others’. Since these 
tests have no right or wrong answers, but in- 
stead reflect the individual teacher's attitude, 
unsupervised administration appears to be a 
valid procedure. 

Besides these teacher measures, secured 
through paper and pencil tests, ratings of the 
teachers were secured from the county superin- 
tendent and the county supervising teacher. 
Two copies of three rating blanks: the Torger- 
son Diagnostic Rating Scale, the Almy- 
Sorenson Rating Scale, and the Michigan 
Rating Scale, were sent to the county superin- 
tendent under whom the teacher worked, with 
the request that he and the supervising teacher 
make separate ratings for each of the teachers 
participating in the study. These ratings were 
to be based not on any one visitation but rather 
upon their cumulative impressions. Since these 

® All teacher measures are described in Section III of this 
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officers know most of their teachers well, such 
ratings should be as reliable as it is possible 
to obtain. 


SECTION III 
DESCRIPTION OF PUPIL AND TEACHER 
Pupit TESTS 

The pupil tests used in this investigation 
assume great importance since the changes in 
the pupils are determined by these tests and the 
teachers, in turn, are measured by the changes. 
Two types of pupil tests were used: (a) those 
measuring factors conditioning pupil achieve. 
ment and (b) those measuring pupil achieve. 
ment. 


The Kuhlmann—Anderson intelligence test 
for seventh and eighth grades* was used to 
measure intelligence. 


A specially constructed test of information, 
appreciations, attitudes, and interests as related 
to community living was used in measuring 
peri achievement. The questions were assem- 

led into two test booklets,” labeled “Social 
Studies Questionnaire”, Form A and Form B, 
with nothing to indicate to the pupil whether a 
question measured information, appreciation, 
interest, or an attitude. The questions in the 
last three named areas were placed in random 
order so as to give minimum assistance to the 
pupil as to their character. Table IV below 
identifies the questions by areas. The reason for 
dividing the questionnaire into A and B forms 
was simply to make it possible to adminster the 
tests in two different testing periods, thus 
avoiding too prolonged testing at any one time. 


PREPARATION AND VALIDATION OF THE PUPIL 
ACHIEVEMENT MEASURES 


The test items in the questionnaire described 
above were constructed and validated by mem- 
bers of the Wisconsin radio project staff* in 
attendance at the Progressive Education Asso- 
ciation summer workshop, Bronxville, New 
York, during June and July , 1938. The follow- 
ing definitions were accepted: 

*F. Kuhlmann and Rose Anderson, Auhimann—Anderson 


Test, Fourth Edition, Grades VII and VIII. (Minneapolis, 
Minnesota: Educational Test Bureau, Inc., 1927). 


™ Copies of these question booklets may be found in 
dix D, thesis on file in the University Library, Uni- 
versity of Wisconsin. 

*A. G. Hellfritzsch, “Prelimi on Communi 
Living Section of Radio Study,” Uopubhened material, Un. 
versity of Wisconsin, 1940. 
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TABLE IV 
DISTRIBUTION OF ITEMS IN THE SOCIAL STUDIES QUESTIONNAIRE 


ATTITUDE 
90 items 
Form A Form B 


120 121 


Functional Information was taken to mean: 

Facts or information in a given area useful 
in everyday living. 

Interests were taken to mean: 

Desire to extend or intensify one’s knowl- 
edge about or experience with a given 
area. 

Appreciations were taken to mean: 

Recognition of social significance of a given 

area. 


Form A 
All of 


INFORMATION 
110 items 
Form B 
All of 
Part Parts 
II II 
and III 


Form A 


Attitude was taken to mean: 

A mental set which conditions the direction 
that one’s behavior (mental or physical) 
will take when responding to some stim- 
ulus within the area. 


Based on these definitions, objectives in the 
different areas were established and test items 
to determine status of pupils with reference to 
these objectives were drawn up. National 
leaders in the field of social studies, in attend- 
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ance at the conference, contributed liberally to 
the crystallization of definitions, objectives, and 
test items. Thus a preliminary form of the test 
consisting of 110 interest, 112 attitude, 120 
appreciation, and 204 information items was 
constructed. The test was then a” ainistered to 
two hundred seventh- and eig+ grade pupils 
and the items validated by the «sr and lower 
third method, using the total suore for each 
type of items as the criterion score in validating 
items of that . To be retained in the test 
the item difficulty had to exceed 20% for the 
total group and 10% for the upper third. Dis- 
crimination was considered significant at the 
1% level. 

The items that were retained, namely: 58 
interest, 90 attitude, 94 appreciation, and 110 
information were put into the booklet form 
described earlier in this section and constituted 
the pupil tests used to measure pupil growth. 

The reliabilities of the tests, determined by 
the split-halves method for 150 pupils selected 
at random from the radio study are listed in 
Table IV. 


TABLE LXI 
RELIABILITIES OF PUPILS’ TESTS 


Pre-test Final test Gain 
.80 . 88 . 64 


Appreciation__- 
i .70 .81 . 62 


Information _-- .91 . 84 . 72 
‘ . 89 . 58 


TEACHER TESTS 


Five teacher tests were administered to the 
teachers participating in the investigation. A 
brief description of each test and the reason 
for its inclusion follows: 


1. The American Council on Education Psy- 
chological Examination for College Freshmen, 
1936 Edition.* 

2. Theory and Practice of Mental Hygiene.?° 

3. A Scale for Measuring Attitudes toward 
Teachers and the Teaching Profession. 

4. Statements about education.*—This in- 
strument is an attitude scale consisting of 


on ee by L. L. Thurstone and Thelma Gwinn Thur- 
Published the American Council on Education, 
Washington, , i For 4 F--. description of this test see .... 


Department of Education, 

Univesity of 424 Unpublished. For full description of 
test see . 

-c. Yeager, An Analysis of Certain Traits of Se- 

hool Interested in — hing, Contribu- 


No. 660. (New York, Bureau of Publica- 
eachers College, Columbia, aa 1935) For 


test see 
2 Paul W. Hasaly “Atti High | School Seniors 
Toward Education,” The School hte, VII (September, 
1939), pp. 501-S09. 
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eighty statements about education divided into 
four parts: “Some Purposes of Education,” 
Shall We Teach?” and “How Shall We 
Teach?” Each division contains ten statements 
worded from a progressive position and ten 
worded from a conservative position. The 
teacher is asked to respond to the statements 
by indicating agreement or disagreement on a 
five point scale. In scoring, one point is 
assigned to the most liberal reaction and five 
points to the most conservative reaction. Low 
scores on the whole test, then, indicate a liberal 
position. while a high score indicates a con- 
servative position. The author reports split 
halves reliability of .87 + .02 which, when 
corrected by the Spearman-Brown formula, 
becomes .93 + .01. The scale yields two kinds 
of data; the total score indicating a general 
attitude of progressivism or conservatism and 
a part score for each of the four divisions indi- 
cating general attitude in each of these areas. 
The scale was included in this study to discover 
if the progressivism or conservatism of the 
teacher as here measured has any relation to 
teaching efficiency. 

5. Social proficiency test2*—The author of 
this test defines social proficiency as “‘consid- 
eration for others.” The measure consists of 
fifty-two social situations to each of which four 
solutions (responses) ate offered. The teacher 
checks the solution he would use if faced with 
that particular situation. There are no right or 
wrong responses. The aim of the questionnaire 
is to get a status view of the teacher's consid- 
eration for others—of his social proficiency. 
The key assigns a value to each of the four 
choices, varying in weight from one to eight, 
the largest weight being assigned to the choice 
that shows the greatest consideration. Hence, a 
high total score indicates a high degree of social 
proficiency. 

The measure was included in this study be- 
cause, according to the author, it measures some 
quality not measured by psychological and per- 
sonality tests, and because it seems that, other 
things being equal, the teacher with more con- 
sideration for others might be the better 
teacher. 

AP MD Dee cos tell Bette, 5 
Experimental Education, VIII (June, 1940), pp. 422-474. 
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TABLE V 
CLass MEANS FoR TEST IN ATTITUDES 


PRE-TEST - 
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47.86 
56. 60 
52. 40 
44.33 
38.14 
52.89 
38. 00 
38. 40 
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. From E. F. Lindquist 


Statistical Analysis in Educational Research. (Boston: Houghton Mifflin Co., 1940) pp. 48-50. 
** Based on 181 pupils, Class Nos. 28, 31, and 39 omitted to’secure homogeneity. 
*** Based on 31 class means, Nos. 28, 31, and 39 omitted to secure homogeneity 


The author reports the contingency coefh- 
cient of validity of the test to be .78 and the 
product-moment validity coefficient as .92. The 
criterion was the judgment of intimate friends 
telative to social proficiency. The reliability 
coefficient was .82 when “stepped up” by the 
Spearman—Brown formula. 


TEACHER RATING SCALES 


Three teacher rating scales were used for the 
purpose of getting supervisory ratings: 


1. Torgerson Diagnostic Teacher Rating 
Scale of Instructional Activities.** 

2. Almy-Sorenson Rating Scale for Teach- 
ers.1® 

3. The re Rating Scale.** 

“T. L orgerson Diagnostic Teacher Rating 


Seal of instructional Acti ctivities,” Bloomington, Ilinois: Pub- 
$ lishing Co., ~~, 1930. (For full description 


my = em, Ie yw my Bm 
‘eachers,” Bl Public School 
aoa Inc. (For full an a ae ae os 
p. 
%* “Michigan Education Association gy eae 
Teachers 


(Lansing, Association, 1 
For full description of this test see p. 19. 
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TABLE VI 
CLASS MEANS FOR TEST IN APPRECIATION 


PRE-TEST FINAL TEST 
o* Mean o 
13. 07 57. 50 12. 92 
19. 21 49.25 30. 59 
11. 58 51. 42 13. 62 
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*** Refer to footnotes Table V. 


SECTION IV 


DEVELOPING THE CRITERIA OF 
TEACHING ABILITY 


Since the teachers in this study are to be 
evaluated in terms of pupil change, the best 
teacher being the one who secures the greatest 
amount of change, the determination of these 
changes assumes utmost significance. 

The first task then was to secure a measure 
or measures of pupil change, and the second to 
determine the amount of this change that may 
be ascribed to the teacher. This section will be 
devoted to an account of the procedures used 
to determine these changes. 

The pupil measures used in this study have 
already been described in Section III. They 
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purported to measure changes in appreciation, 
attitude, information, and interest as related to 
the problem of “Community Living”. The 
mean change scores for each of the thirty-four 
classes as well as the mean pre and final scores 
from which they were derived are shown in 
Tables V, VI, VII and VIII. 


THE PROBLEM OF HOMOGENEITY 


Since each teacher is to be measured on the 
basis of the average change in her class, it is 
evident, that if such measurement is to be valid, 
the classes will have to be approximately alike 
at the beginning of the teaching period. That 
is, to be fair to each teacher, no class should 
differ markedly from another in such factors 
as mental age, intelligence, informational status, 





— — nn on Ch Od CD CO 68 68 OO HO OO OO OO CO OO OO OO OO & 


September, 1945| THE MEASUREMENT OF TEACHING ABILITY 


TABLE VII 
Ciass MEANS FOR TEST IN INFORMATION 


PRE-TEST 
Mean 


59. 25 


N 


0. 
4 
4 
7 
7 
5 
5 
6 
7 
9 
7 
5 
6 
7 
8 
6 
7 
6 
3 
8 
5 
5 
5 
5 
7 
5 
7 
4 
5 
8 
6 
5 
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5 
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24 


52.97 
*** Refer to footnote Table V. 


socio-economic status, etc. Table X lists the 
mean intelligence quotients and the mean 
mental ages of the pupils in the thirty-four 


classes. 


To test homogenity the F test described by 
Snedecor’? was applied to the thirty-four 
classes. This test measures the significance of 
differences among any number of group means 
and determines whether they vary more than 
would be expected in a random sampling from 
a homogeneous, normally distributed popula- 
tion. The F values and the 5% and 1% 
tabular values with which they are compared 
are shown in Table XI. A value of F greater 
than the tabular 5% value, indicates that the 
means vary more than would be expected in 


™G. W. Snedecor, Statistical Methods, Ames, Iowa: Col- 
legiate Press, 1938, pp. 184-198. 


o* 
10. 72 
18. 34 
14. 53 
17. 26 

9. 86 

6.11 

8. 30 

8. 50 


FINAL TEST 
Mean o 
68.75 12.31 
52.75 22.97 
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random sampling from a homogeneous, nor- 
mally distributed population. 

As may be seen in Table X, the F values of 
four of the pupil factors indicate heterogeneity 
with reference to class, means greater than ex- 
pected at the 5% level. To secure homogeneity, 
classes with discrepant means were deleted 
from the group one at a time until the F values 
fell within the desired 5% tabular values. To 
secure this desired degree of homogeneity, 
three classes: Nos. 28, 31, and 39 had to be 
deleted. Table XI lists the F values of the 
remaining 31 classes with the comparative 5% 
and 1% tabular values. Most of the F values 
have been reduced below not only the 5% 
value but the 1% value as well. 

One of the fundamental conditions f wing 6 
site to the application of the F test e de- 
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TABLE VIII 
Ciass MEANS For TEST IN INTEREST 


No 


qd 
4 
7 
7 
5 
5 
6 
7 
9 
7 
5 
6 
7 
8 
6 
7 
6 
3 
8 
5 
5 
5 
5 
7 
5 
7 
4 
5 
8 
6 
5 
a 
5 
7 
200 
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** *** Refer to footnotes Table V. 


scribed is that the variances of the several 
classes must not differ significantly among 
themselves.** That this condition prevailed was 
determined by applying a chi-square test de- 
scribed by Rider*® and by Lindquist.*° The chi- 
square values obtained for the thirty-four 
classes are indicated in Table X together with 
the comparative 5% and 1% tabular values. 
The effect of the deletion of three classes upon 
the variance of the classes is shown in Table 
XI. (The variances of the thirty-one classes 
were well within the 5% value.) 


™ Tbid., p. 208. 
™* Paul R. Rider, An Introduction of Modern Statistical 
arr (New York: John Wiley & Sons, Inc., 1939), pp. 


E. F. Statistical Analysis in Educational Re- 
—_ (Boston : Mifflin Company, 1940), pp. 
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Factors AFFECTING CHANGE 


Having thus established through the analysis 
of variance technique that thirty-one of the 
thirty-four classes do not vary more than would 
be expected in a random sampling from a 
homogeneous, normally distributed om 
the four sets of change scores ( Tables 
XII, XIII, XIV and XV) for these thirty- 
one schools were taken and arranged in 
descending order, for the four criteria of teach- 
ing effectiveness based on changes in apprecia- 
tion, attitude, information, and interest, re- 
spectively. This procedure is not wholly de- 
fensible, however, since it assumes that all of 
the change or absence of change made by the 
various classes during the experimental period 
is due to the teacher. That such assumption is 
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TABLE IX 
MENTAL AGE AND INTELLIGENCE QUOTIENTS 
M.A 


< 
B 


N 


o. 
4 
4 
7 
7 
5 
5 
6 
7 
9 
7 
5 
6 
7 
8 
6 
7 
6 
3 
8 
5 
5 
5 
5 
7 
5 
7 
4 
5 
8 
6 
5 
4 
5 
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*** Refer to footnotes Table V. 


TABLE X 
F AND CHI-SQUARE VALUES USED TO DETERMINE F AND CHI-SQUARE VALUES USED TO DETERMINE 


HOMOGENEITY OF MEANS AND VARIANCE OF HOMOGENEITY OF MEANS AND VARIANCE 
Purr. FACTORS IN THE 34 CLASSES or PupiL Factors—31 


Pupil Factor F Chi-square Pupil Factor F Chi-square 
Information pre-test __ 2. 636 44.352 Information pre-test _- 1. 364 36. 755 
Attitude pre-test 2. 330 q Attitude pre-test_-___- 1. 872 38. 640 
Appreciation pre-test _ 2.154 , Appreciation pre-test - 1. 734 40. 796 
2. 334 ‘ Interest pre-test _-_..-- 1.198 28. 080 
1. 280 ; ‘ 1.279 41. 580 
1. 936 . - 1. 265 36. 337 


5% tabular value for 
$4 classes_.....__- 1.50 ‘ classes 1. 54 43.77 


1% tabular value for 7 
Classes.......... 1. . 1d 1. 83 50. 89 
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TABLE XII 
RELATION OF PRE-TEST TO CHANGE 


Appreciation Attitude Information 
pupils (original group) —.40 —.49 —. 53 
181 pupils (homogeneous group)--_- —.29 —.40 —.40 
class —.03 —.38 —.31 


TABLE XIII 


RELATION OF I. Q. TO CHANGE 


Appreciation Attitude Information 
.09 . 06 .18 
.09 . 06 . 20 
.12 01 . 35 


— £eRAMSe'V Sy 4'U. 


TABLE XIV 
RELATION OF M. A. TO CHANGE 


Appreciation 


untenable is revealed by study of the relation- 
ship between some of the pupil factors and 
pupil gain as indicated by ckictents of cor- 
relation in Tables XII, XIII and XIV: 


Because many of the correlations are small, 
it is essential to know how large they must be 


to be significantly different from zero. Lind- 
quist** provides a formula and table for deter- 
mining significance at the 1% and 5% levels. 
At the 5% level: 


When N = 200, r is significant if it exceeds . 14. 
When N = 181, r is significant if it exceeds . 15. 
When N = 31, ris significant if it exceeds . 35. 


THE HiGH PreE-TestT HANDICAP 


Inspection of the above correlations in the 
light of this measure of significance indicates 
that the only significant relationships that need 
to concern us here are those between pre-test 
scores and gains. The relation of pre-test score 
to gain is definite. It is evident that pupils 
starting with low pre-test scores make the 
greater gains and those starting with high pre- 
test scores make the lesser gains. This relation- 
ship holds not only for individuals in the total 
group but also for the classes, except in the 
case of appreciation. This means that a teacher 
starting with a class having a high pre-test 
score is handicapped, since by virtue of this 
pre-test position alone, the gain will be cor- 
respondingly low; while a teacher starting with 

™ Ibid., pp. 210-212. 


Attitude Information 


01 —. 02 o-e —.04 
. 03 . 00 .13 —. 02 
.31 —.02 -47 - 06 


a class having a low mean pre-test score will, 
due to pre-test position alone, have an advan- 
tage, as the gain will be correspondingly high. 


This tendency has been found and noted in 
other studies. Rostker*? found negative coeffi- 
cients between all pre-test and gain scores, 
ranging in magnitude from —.18 to —.54 in 
eight measures. Rolfe* likewise found negative 
coefficients between all pre-test and gain scores 
ranging from —.29 to —.67. Von Eschen* 
found coefficients of correlation between pre- 
test and gain ranging from —.37 to —.83. 


A plausible explanation of this phenomenon 
of negative relationship between pre-test and 
gain scores is found in the so-called “ceiling” 
concept—that a test of a given number of items 
limits the student with high pre-test scores 
since he may reach the “ceiling’’ whereas the 
student with the low pre-test scores has ample 
opportunity to make rather extensive gain with- 
out nearing the “ceiling”. Examination of the 
distribution of scores in the tests in the present 
study, indicates that this explanation might 
have some validity in one area tested, namely, 
that of Interest, but not in the other tests. For 
these other areas the “ceiling” concept does 
not appear to apply. 

Leon E. Rostker, “The Measurement of Teaching Abil- 
ity,” published herewith. 
Jean Rolfe, “The Measurement of Teaching Ability,” 
published th. 
Von Eschen, “An Evaluation of a Super- 


™ Clarence R. ’ 
by | a with Seventh- and Eighth-Grade Teachers in 
the State of Wisconsin,” published herewith. 
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The learning curve plateau concept as ex- 
planatory of the negative pre-test gain relation- 
ship appears logical if the test used is of the 
power type, but inspection of the test scores in 
this study would hardly permit such an inter- 
pretation. Possibly the tendency for teachers to 
give extra time in coaching pupils with low pre- 
test scores may result in the extra increments. 
Since the tests were not, however, scored by the 
teachers, nor were the individual pupil results 

rted to them, there was no opportunity for 
them to know the pupils with low initial test 
scores.*® 


TREATMENT OF THE HIGH PRE-TEST SCORES 


In this study a special application of the 
multiple regression equation is employed to 
correct for high initial test scores. 

The extent and the direction of the relation 
of certain pupil factors (pre-score, 1.Q., and 
M.A.) to gain score has been shown. On the 
basis of these known relationships, prediction 
of the direction and magnitude of gain was 
made through the use of the multiple regres- 
sion technique. Having predicted the amount of 
gain that may be expected, due to known pupil 
factors, this predicted gain may then be sub- 
tracted from the actual gain to secure a measure 
of the contribution of factors other than those 
included in the regression equation, among 
which the teacher is one. If other factors such 
as the home, radio, health, etc., may be assumed 
to be ey constant or randomly dis- 
tributed from class to class, the residual gain, 
i.e., the actual gain less the predicted gain, may 
then be attributed to the teacher; the residual 
gain thus becomes an indirect measure of teach- 
ing efficiency. 

The proof of the above treatment appears 
simple: Having found significant correlation 
between 1.Q., M.A. and pre-test with change, 
a multiple regression equation can be estab- 
lished which when applied should predict the 
amount of pupil change expected from these 
factors. The known change less the predicted 
change should equal the change that may be 
ascribed to unmeasured factors (teaching plus 
a constant). The check on the accuracy of this 
solution consists of correlating the residual 
scores with the pre-test scores, and examining 
the resulting coefficients. 

In line with the foregoing discussion, four 
criteria of teaching effectiveness based on 


A statement relative to the statistical significance of 
these negative correlations will be issued at a later date. 


changes in pupils in Appreciation, Attitude, 
Information, and Interest as related to Com- 
munity Living were developed. A Composite of 
the four constituted a fifth criterion. 

The relationships, expressed as coefficients of 
correlation, necessary to develop the multiple 
regression coefficients are listed in Table XV. 

The Beta coefficients obtained by the Aitken 
method** are reported in Table XVI. 

The resulting prediction equation for Appre- 
ciation is written:?* 


G, == —.060x, a 498x, — -268x, 
Where G, = predicted gain in appreciation 

x, == 1.Q. class mean (changed to a 
standard score) 

x, == M.A. class mean (changed to a 
standard score) 

x, == Appreciation pre-test (changed 
to a standard score) 


The prediction equations for gain in Atti- 
tude, Information, and Interest were similarly 
developed. Applied to the class means changed 
to standard scores,** gains for each class in 
each of the four measures were predicted.*® 
These predicted gains were then subtracted 
from the observed gains and the differences 
(the residual gains)*° ascribed to the teacher, 
plus factors not measured, assumed to be con- 
stant from class to class. 

Thus four criterion scores were obtained for 
each teacher, based on the four objectives 
measured, so the teachers could be ranked from 
highest to lowest on the basis of class gain in 
appreciation, attitudes, information, or interest. 
A fifth composite criterion score was obtained 
for each teacher by adding the residual gains in 

% G. H. Thomson, The Factorial Analysis of Human Ability, 
(Boston: Houghton Mifflin Company, 1939), pp. 89-95. 

27 Ibid., p. 92. “Scores to be substituted in this regression 
equation must be standard scores.” 


See Tables A9-Al3, 
in the University Library, 


scores the pre test scores and final 
tests scores were treated as separate series. The standard scores 
for the pretest were derived by the standard formulae 


s— M,,.+ 
; the standard scores for the final test scores— 
Tore-t 


zs—M 
—— +. While the o, for the pretests and final tests did 


sti ‘ recal gave, however, scores that 
correlated with LaDuke’s scores by r = .98. The editor is 
indebted to Miss Harriet Wright for this calculation.) 

2 Ibid. 

% Ibid. 





JOURNAL OF EXPERIMENTAL EDUCATION [Vol. 14, No.1 


TABLE XV 
COEFFICIENTS OF CORRELATION EMPLOYED IN DEVELOPING MULTIPLE REGRESSION COEFFICIENTS 
(N = 181) 


Appreciation Attitude Interest 
.d. 4 pre-test pre-test pre-test 
. 00 ; . 59 . 62 j -15 
. 68 ‘ . 55 . 58 d .14 
.12 : —. 08 ees eh a 
.01 ‘ emtete —. 88 inlets 
. 35 .47 sae i aac ional 
. 08 : as ie bow —.48 


I 
1 


TABLE XVI 
Beta COEFFICIENTS 


For Appreciation___- 
For Attitude 
For Information... 


TABLE XVII 
CRITERIA SCORES FOR EACH TEACHER BASED ON FOUR PUPIL MEASURES AND COMPOSITE 


Apprecia- Atti- Informa- Inter- Compos- 
Teacher tion Rank tude tion Rank est Rank ite Rank 

. 8305 . 4661 . T1297 .8784 20 . 1638 
. 2355 . 8653 . 2589 . 8308 6 .1727 
. 0949 . 6350 . 8750 .4970 25 . 9181 
. 3930 . 8763 . 0536 .4058 21 .4171 

. 9161 a . 4405 . 8568 

. 8532 . . 4645 . 1968 

. 6970 . 8072 

. 1880 . 4162 

. 3513 . 4086 

. 3555 . 4956 

. 9851 { . 5692 

. 5826 . 9086 

. 6041 . 3233 

. 6694 . 1218 

. 5678 .1124 

. 0828 . 7942 

. 7381 . 8534 

. 1821 . 4701 

. 6628 . 4196 

. 6242 

. 2124 

. 4697 

. 4480 

. 2808 

. 0181 

. 5522 

. 1140 

‘ . 3785 

: —l. . 8307 

23 ‘ y 1. 2708 

17 . — .1815 . 8990 


* Teachers number 28, 31, and 39 are omitted to secure homogeneity. 
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TABLE XVIII 
INTERCORRELATIONS OF CRITERIA SCORES 
(N = $1) 


C-att. 


# To save time and space the following abbreviations are used. 
C-app. = Criterion scores in Appreciation 


C-att. — Criterion scores in Attitude 


C-inf. = Criterion scores in Information 


C-int. = Criterion scores in Interest 
C-com. = Composite criterion score 


the four measures.* Table XVII lists the five 
criterion scores for each teacher and the rank 
which each teacher holds with reference to the 
other teachers in each of the measures. 

It has been previously stated that theoret- 
ically these criterion scores should be free from 
the effects of high pre-test scores. Proof that 
such freedom has been established is found in 
the coefficients of correlation between pre-test 
and residual gain (i.e. the teacher criterion 
scores). These correlations were found to be 
—.11 for information, —.11 for interest, .10 
for attitude, and —.23 for appreciation. None 
of these are statistically significant. 


INTER-CRITERIA RELATIONSHIPS 

Since M.A. and LQ. also entered into the 
regression equations, the criterion scores are 
likewise independent of these factors. Each of 
the four teacher criterion scores thus is to be 
a measure of that portion of the class gain that 
may be attributed to the teacher plus other 
factors not measured, but assumed to be con- 
stant from class to class. As has been previously 
noted, the fifth criterion score is a composite 
of the other four. 

Examination of these criterion scores and the 
resulting ranking of the teachers (Table XVII) 
indicates that in three of the measures: appre- 
ciation, attitude, and information the on we 
of the teachers are quite comparable, i.e., a 
teacher ranking low in one measure ranks low 
in the other measures and vice versa. The situa- 
tion in the case of the measures of interest is 
quite different, the rankings showing no such 
conformity to other measures. 

These relationships are more definitely re- 
vealed in the intercorrelations of the various 
criterion scores which are listed in Table XVIII. 

se Raed oatinte', Frchty tea "tiehaith 
(New York: Longmans, Green and Co., 1937), p. 179. 


According to McCall:** “Scientific measure- 
ment (of teaching efficiency) is fair only when 
we measure the amount of desirable change 
produced in a pupil (1) by a given teacher . . . 
(2) in a standard time . . . (3) in standard 
pupils . . . and (4) when the measurement is 
complete.” That the criteria herein developed 
approximate McCall's specifications will doubt- 
less be granted since, (1) the pupils measured 
were taught by just one teacher, (2) the time 
element was the same for all classes, (3) class 
(pupil) differences in 1.Q., M.A., and pre-test 
were controlled statistically, and (4) measure- 
ment in the area taught (community living) 
was complete in that not only information but 
appreciation, attitudes, and interests were 
measured. 


SECTION V 


STATISTICAL VALIDITY OF SELECTED 
TEACHER MEASURES AND SUPER- 
VISORY RATINGS 


This study calls for the determination of cri- 
teria of teaching efficiency based upon objec- 
tively determined changes in pupils. The pre- 
ceding section has detailed the procedure 
through which such criteria have been deter- 
mined. This section will be devoted to com- 
parisons between the criteria and factors 
measured by teacher tests and rating scales pre- 
viously described. 


INTELLIGENCE OF TEACHERS AS A FACTOR 
RELATED TO TEACHING EFFECTIVENESS 


Among all the factors that have been com- 
pared with criteria of teaching success, intelli- 
gence, as measured by psychological tests, has 


*W. A. McCall, Measurement (New York: Macmillan Co., 
1939), p. 404. 
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appeared in more studies than has any other 
shrle factor. The test used as a pr. Bo of 
teacher intelligence in this investigation was 
the American Council Psychological Examina- 
tion for College Freshmen, 1936 Edition. The 
teacher scores, expressed as part and total 
scores, are shown in Table . Comparison 
of these scores with the 1936 norms* for 
freshmen in teachers colleges indicate that, in 
total score, the teachers ranged from about the 
30th percentile to the 99th percentile. In Q,, 
median, and Q, the comparison was as pre- 
sented in Table XIX. 

These teachers, then, are slightly higher in 
intelligence than are freshmen entering teacher 
training institutions. Practically the same condi- 
tion is found in relation to the different parts 
of the test. 

The correlations between the total scores and 
the criterion scores, are given in Table XX. 

“L. L. Thurstone and Thelma G. Thurstone, “The 1936 


Psychological Examination for College 
a gl Educational Record, XVIII (April, 1937), pp. 
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TABLE XIX 


COMPARISON OF SCORES FOR TEACHERS IN Tuis 
STUDY AND TEACHERS COLLEGE FRESHMEN 


Freshmen in 
Teachers Teachers in 
Colleges this study 
124 155 
161 189 
235 


CORRELATIONS BETWEEN TOTAL SCORES AND 
CRITERION SCORES 

r 
C-att. with total psychological__.........--. . 39 
C-app. with total psychological_----..-.---.- 44 
C-inf. with total psychological __---.---..--- 49 
C-int. with total psychological ‘ 
C-com. with total psychological___.....-.--- .48 





N =31 (pairs of scores) 
Significant at 5% level ifr =--.-.-.---.-.-- . 36 
Significant at 1% level ifr ; 


TABLE XXI 
TEACHERS’ SCORES ON AMERICAN COUNCIL PSYCHOLOGICAL EXAMINATION 


Teacher II Ill IV 
32 46 40 
32 15 36 

26 

59 


68 
30 


50.94 
18.14 


94. 
51. 38 
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It is evident that there is a significant rela- 
tionship between four of the criteria and intel- 
ligence as measured by the above test. Pupil 
change in information shows the highest rela- 
tionship (being significant at the 1% level) and 
changes in interest the lowest. Checking back 
over these tests one finds that the interest test 
had a lower reliability than the other tests, and 
that the mean class gain scores were the small- 
est, which may account, at least in part, for the 
low correlation. It will be recalled that the in- 
terest criterion differed from the other criteria.** 

Having found such significant relationship 
between the intelligence factor and the criteria 
of teaching effectiveness, it seemed advisable to 
compare the various criteria with the different 
parts of the Oe gp ong test to discover which 
“om contributed most to the relationship. 

ese correlations are reported in Table XXII. 

It is — that certain parts of the psy- 
chological test correlate more highly with the 
various Criteria than do other parts. Completion 
(Part I), artificial language (Part III), and 
analogies (Part IV) apparently contribute more 
to the total relationship found than do arith- 
metic (Part II) and opposites (Part V). Again 
it is evident that the psychological examination 
is not related significantly to the Interest cri- 
terion. 

Testing the practicability of applying the 
principle that a number of tests or subtests, 
significantly related to a criterion and not 

® See Section III. 


highly inter-related, form the most reliable 
predictive device, the intercorrelations between 
the teachers’ scores on the different parts of the 
psychological test were calculated (Table 
XXIII). 

These intercorrelations, being quite similar in 
size to those usually reported for parts of the 
— examination, indicated the possi- 

ility of forming a composite of the parts of 
the psychological test that were significantly re- 
lated to the criterion (Parts I, III, and IV) and 
comparing this composite with the criterion. 
Likewise, the individual criteria showing great- 
est consistency (C-app., C-att., and Cinf. ) were 
composited into a mew composite criterion. 
Thus comparisons could be made between a 
psychological examination composited from 
three of its five parts: Psych. I, III, and IV, 
and the two composite criteria: C,-com. (com- 
posed of all four criteria) and C,-com. (com- 
posed of C-app. + C-att. + Cinf.). The 
following coefficients of correlation were 


found: 


r 
Psych. I, III, and IV with C,-com..-..-.-- 52 
Psych. I, III, and IV with C,-com.....--- 55 


Thus by omitting two parts of the psycho- 
logical examination, its validity as a measure 
of teaching efficiency with C,-com. as the cri- 
terion was increased to .52; and with C,-com. 
as a criterion it was increased to .55.%* 


% 40 items out of 212 or 18.7% were found to be valid 
(significant at 5% level). 


TABLE XXII 
CORRELATIONS FOR PARTS OF PSYCHOLOGICAL TEST AND THE CRITERIA OF TEACHING EFFICIENCY 


Part I 
(Completion) 


* Significant at 5% level. 
** Significant at 1% level. 


Part II 
(Arithmetic) 


Part V 
(Opposites) 


Part III 
(Language) 


Part IV 
(Analogies) 
.41* 
. 46** 
.43* 
14 
- 


TABLE XXIII 
INTERCORRELATIONS BETWEEN DIFFERENT PARTS OF THE PSYCHOLOGICAL EXAMINATION 


Completion 


PFE ERAGE ea eee 


Artificial Language 
Analogies 


II Ill 


Total 
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That this is an unusually high correlation 
between a measure of intelligence and teaching 
efficiency is seen in with results of 
other in ions. rs,2* Boardman,** 
Phillips,** Uliman,* Knight,“ Whitney,“ and 
Odentvellers# ively, coefficients 
of correlation of .43, .33, .26, .15, .09, .03, 
and .00 between the intelli measures used 
and teacher effectiveness upon super- 
visory ratings. 

Using the objective criterion of pupil change, 
Armstrong,** found a correlation of .20; Barr, 
Torgerson, et al.** of —.19, and Rostker** of 
.48. The present study substantiates the findings 
of Rostker that intelligence, as measured by the 


[Vol. 14, No.2 


as interest in the profession of t 
ee of investigators ow little cial 


as to the interest factor. Usin rt judgment 
as the criterion of teacher samen 3 Kriner,* re. 
ports interest an important factor, while Ul. 
man* and Phillips rt practically no rela. 
tionship. With pupil ge as the criterion, 
Barr®* and others report no relationship for the 
Strong Vocational Interest Blank. Rostker** re. 
ports, however, a significant relationship.® 
and Rolfe a small positive relationship, both 
using the Yeager Test. 

The <page F found in this investigation 
expressed as c ients of correlation are re- 
ported in Table XXIV: 


TABLE XXIV 


CORRELATIONS BETWEEN SCORES ON THE YEAGER TEST OF INTEREST IN 
TEACHING AND THE SEVERAL CRITERIA 


Attitude toward Teaching (Yeager) with 


N = 31; These correlations are significant at 5% 
American Council Psychological Examination, 
appears significantly related to teaching effi- 
ciency, at least as herein defined. 


ATTITUDE OF TEACHERS TOWARD TEACHING 
AS A FACTOR RELATED TO TEACHING 
EFFICIENCY 

Attitude toward teachers and teaching as 


measured by the Yeager test may also be inter- 
*G, T. Somers, Pedagotical Prognosis: Predicting the Suc- 


, Contributions to Education, No. 

140 (New —~y ji 2 Publications, Teachers College, 
Columbia University, 1923). 

=Cc. W. Tests as a Measure of 


Professional 
in ” High Schools, Contributions to Educa- 
ew York: Bureau of Publications, Teachers 
one University, 1928 
Phillips, An Analysis of Certain Characteristics of 
Actine —- Prospective Teac tributions to Education, 
Ten 161 Peoeerete, Tennemee: ” George Peabody College for 


935). 

re Vilean, The Prognostic vee Z he wy x Factors 

Related iS Teaching Success (Ashland, L. Garber 
4 Frederick B. Knight, ities Related to Success in 

Teaching, Contributions to tion, No. 120 (New York: 

cape of Teachers College, Columbia University, 
22). 


Whitney, The Prediction of Teaching Success, 
Research rm No. 6 (Bloom- 

, 1924). 
ay of ee 


sohi Certain Teacher 

to T. ay F—~ re whip Effectiveness, 
. Thesis, Leland wm, 1936. 
“The Vi casurement ol 


> level ifr = .: 


None of these correlations are statistically 
significant; the correlation of scores on the 
Yeager Test with a composite of the criterion 
scores gave a correlation of .16 which is not 
statistically significant. 


KNOWLEDGE OF THE. THEORY AND PRACTICE 
OF MENTAL HYGIENE AS A FACTOR 
IN TEACHING EFFICIENCY 


In the present study the teachers’ knowledge 
of the theory and practice of mental hygiene as 
measured by the Torgerson test was correlated 
against criteria of pupil change (Table XXV). 

The correlations are low, exc that with 
appreciation which reaches Proc thon at the 
5% level. They are all positive except with the 
interest criterion. The correlation with a com- 
posite of these criterion scores was .24. In view 

“H. L. Kriner, Pre- ay -y Predictive of Teacher 

Pennsylvania State Studies in Education, No. 1 
ta aa Pennsylvania: Pennsylvania State College, 


- an. The Prognostic Sw Z o- Factors 
‘caching Success (Ashland, A. L. Garber 


. See ot hs 8 Super- 


tury at 
Measuremen 
mm - _ Ability,” ‘School and Society, LI (January, 1940), 
In Rostker‘s the tive. 
os ae panties’ ctnty, relationship was nega‘ 
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TABLE XXV 
CORRELATIONS BETWEEN ScoRES OF MENTAL HYGIENE AND CRITERIA OF TEACHING EFFICIENCY 


Knowledge of Mental Hygiene with 


C-0.. «uss 


Significant at 5% level if r = .36 


ae 
ET 
C-inf Dn sccas ts ame 


r 
20° 
. 37* 
wee shmenahapisasgumenp ees sales daudetaaael .29 

—. 08 
24 


TABLE XXVI 
CORRELATIONS BETWEEN PARTS OF HARNLY TEST AND CRITERIA OF TEACHING EFFICIENCY 


C-att. 
. 05 
.12 
.09 
. 26 
.12 


Educational Purposes 
Educational Policies 
Educational Objectives 
Educational Methods 
Educational Total_- 


20;§Significant at 5% level if r 


N 


of the evidence presented, the validity of the 
test as a measure of teaching efficiency is low 
(significant at about the 7% level). 


RELATION OF EDUCATIONAL CONSERVATISM 
AND PROGRESSIVISM TO TEACHING 
EFFICIENCY 


No mention has been found in previous in- 
vestigations of the factor of conservatism. As 
has been pointed out in the description of the 
Harnly test,°* used to measure this factor, the 
teacher's conservatism or progressivism with 
reference to educational purposes, policies, 
objectives, and methods was measured. The 
relationships to the criteria with the r's reflected 
so that positive r's show a liberal position and 
negative r’s a conservative position are given in 
Table XXVI. 

Evidently whether a teacher is liberal or con- 
servative with reference to educational pur- 
poses, policies, and objectives makes little 
difference with pupil changes in attitudes, 
appreciations, information, and interests as 
measured in this study. The correlations be- 
tween these criteria and educational methods are 
somewhat higher. Since all of these correlations 
are negative, it appears that the better teachers 
tend to be more conservative than poorer 
teachers. 


RELATION OF SOCIAL PROFICIENCY OF THE 
TEACHERS TO TEACHER EFFICIENCY 


Various personality factors have been studied 
in relation to teaching efficiency. Social intelli- 
gence as measured by the Social Intelligence 

% See Section III of this report. 


C-int. C-com. 
. 06 
.03 

—. 06 

—. 23 

—.10 


C-inf. 
. 33 
- 16 
.19 
—.14 
.16 


C-app. 
. 08 
. 03 
.09 
—. 38 
. 04 


Test by Moss and others was found insignifi- 
cantly related by Uliman** and Barr,** with 
coefhcients of correlation of .18 and .19 re- 
spectively. Personality, subjectively estimated, 
was found significantly related by Somers** and 
Odenweller,"* with coefficients of correlation of 
.62 and .83 respectively when the criterion was 
determined by subjective expert judgment. In 
the latter comparison, the judgments of per- 
sonality and teaching efficiency were made by 
the same supervising officers. 

Social proficiency as here reported was 
measured by the Jackson’s “Social Proficiency 
Test’. The Relationship between this test and 
teaching efficiency is indicated in Table XXVII. 

Interpreted broadly, the above results mean 
that the teacher who secures the greatest desir- 
able change in pupil's attitudes, —— 
information, and interests is incline to be less 
considerate of others as here measured than is 
the teacher who is less effective. 


THE VALIDITY OF SUPERVISORY RATINGS AS 
MEASURES OF TEACHING EFFICIENCY 


Through the cooperation of the county super- 
intendent and supervising teachers having 
supervisory jurisdiction over the teachers who 


SR. R. Ullman, The Prognostic Value of Certain Factors 
ag a Teaching Success (Ashland, Ohio: A. L. Garber 
% A. S. Barr; T. L. Torgerson, wy ge Hy 
of Certain Instruments Emplo Measurement 
tng found fa in EE Bee of Tooshing 
Macmillan Co., 1935), pp. 73-141. 
G. Somers, Predicting the Suc- 
cess of Pros Pasion SS a 
40 (New York: Bureau of eachers College, 
Columbia University, 1923). 
A. L. Odenweller, e the of ny 
676 ¢ nor York: B u of 


Predictine 
Contributions to Education, No. 
Publications, Teachers College, iS Ualvenity, "1936). 
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TABLE XXVII 
CORRELATIONS BETWEEN SCORES ON JACKSON SOCIAL PROFICIENCY TEST AND 


N =20 
Significant at 5% level if r 


TABLE XXVIII 
TEACHER RATINGS BY COUNTY SUPERINTENDENT AND SUPERVISING TEACHER 


County Superintendent 
Torgerson Michi 
Scale 


e 
610 


457 


555. 65 


60.10 . 
109. 82 


10.77 


Almy-S. 
Scale 


Supervising Teacher 
Torgerson Michigan Almy-S. 
Scale Scale 


177 
457 


569. 33 


58. 63 ‘ 
126. 40 


0 
136. 37 , 
13. 87 


26. 58 


Teachers number 28, 31, and 39 are omitted to secure homogeneity. 


participated in this study, it was possible to 
secure three separate ratings on each teacher 
from each of two raters. These ratings, based 
on the Torgerson, Michigan, and Almy- 
Sorenson teacher rating scales are reported in 
Table XXVIII. 


Since in the analysis of these scores, com- 
posites of the three separate ratings by the 


superintendent and of the three ratings by the 
supervising teacher were desired, the raw scores 
were changed to standard or sigma scores" 
and all calculations and comparisons were made 
from these standard scores. 

mp Score — Mean Score __ ¢g 
may be added, subtracted, or averaged.” Henry E. Garrett, 
Statistics in Psychology and Education, Second Edition (New 
York: Longmans, Green and Co., 1937). 


score. “Standard scores 
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The primary purpose of securing supervisory 
ratings was to compare such ratings with the 
objectively determined pupil change criteria. 

If the superintendents’ three ratings of the 
teachers intercorrelated highly, then a composite 
of the three could be compared with the criteria 
of teaching. The same logic applies to the 
supervisory teachers’ ratings. The intercorrela- 
tions indicated were calculated (Table 
XXIX).” 


TABLE XXIX 


INTERCORRELATIONS BETWEEN SUPERVISORY 
RATINGS 


Supervising 
Co. Supt. Teacher 
Almy-Sorenson with 
| TTT . 53 .73 
Almy-Sorenson with 
. 63 . 88 


orgerson 
Michigan with Tor- 
‘ads .70 . 65 


_ ae 
N =31 : 
Significant at 1% levelifr = .46 


Correlations between each of the ratings and 
each of the criteria were computed (Table 
XXX): 


If pupil change as here measured is accepted 
as a valid criterion of teaching efficiency then 
the supervision ratings here provided are 
invalid. 

The correlations for composites of the 
superintendents’ and supervising teachers’ 
ratings are reported in Table XXX]. 

These results substantiate the above conclu- 
sion with reference to supervision ratings. 

Changing the point of emphasis for the time 
being, and thinking of the gp cy ratings 
as the criterion of teaching ability, the other 
teacher measures were compared with this cri- 
terion. These correlations are reported in Table 
XXXII. 

The results indicate clearly that there is no 
significant relationship between any of the 
teacher measures used in this investigation, and 
teaching efficiency as measured by supervisory 
ratings. The results on the Harnly test of liberal 
and conservative viewpoint, approach signifi- 
cance. 

Table XXXIII summarizes the findings pre- 
sented in this section concerning the relation 
of various teacher factors to the criteria of 
teaching efficiency and the statistical validity of 


TABLE XXX 
CORRELATIONS BETWEEN SUPERVISORY RATINGS AND CRITERIA OF TEACHING EFFICIENCY * 


Superintendents’ Ratings 
Torgerson 


Almy-S. Michigan 
—.01 
—.10 
. 24 
—.00 
—.01 


Significant at 5% level ifr = .36 


TABLE XXXI 


CORRELATIONS BETWEEN COMPOSITE OF SUPER- 
VISORY RATINGS AND CRITERIA OF 
TEACHING EFFICIENCY 


Composite Composite 
Supt. Sup. 
Rating Teachers’ 
Rating 
—. 03 


—.16 
—_25 


© Since these ratings were expressed as standard scores, the 
following correlation formula was used: (From E. F. Lind- 
quist, A First Course in a Ye p. 150). 
. 


xy 


= 
N 


—~ gon Teachers’ Rating 
i 


Almy-S. 
—.29 
—.27 
—.02 
—.19 
—.24 


chigan 
—. 35 
—.10 
—.21 
—.30 
—.30 


Torgerson 
—.23 
—.27 

.04 
—.05 
—.14 


the instruments used to secure a measure of 
these factors. 

It is evident from these data that the statis- 
tical validity of only one of the measures, 
namely the American Council Psychological 
Examination, meets the statistical significance 
essential to an instrument useful for predictive 
purposes. How effective this test would be for 
predictive purposes is shown by comparing the 
upper and lower fourths of the teachers selected 
on the basis of this test (Psych. I, III, and IV) 
with the upper and lower fourths of the cri- 
terion group (C-com.-3). On the basis of 
chance, the upper fourth on the test would be 
found distributed equally in the four quarters 
of the criterion group. 
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TABLE XXXII 


CORRELATIONS BETWEEN CERTAIN TEACHER MEASURES AND COMPOSITES 
OF SUPERVISORY RATINGS 


a mce (American Council Psych.)* 


ge of Mental Hygiene (Torgerson)* 


Attitude toward Teaching (Yeager)* 
Conaanen (Harnly)** 
onservatism 


*N = 30; Significant at 5% level ifr = .36 
**N = 20; Significant at 5% level if r =.44 


ite’ Composite Sup. 
ting Teacher Rating 


Com 
Supt. 
—.12 
Pe 
—.15 
.39 
—.389 
.01 


TABLE XXXIII 


STATISTICAL VALIDITY OF TEACHER 


WHEN CRITERION IS 


DESIRABLE CHANGE IN 


American Council Psychological 

Part 1 Completion 

Part 2 Arithmetic 

Part 3 Artificial Language 

Part 4 Anal 

Part 5 Opposi 

Guencaie | of Paris 1, 3 and 4 
Knowledge of Mental Hygiene (Torgerson) 
Attitude Toward Teaching (Yeager) 
Social Proficiency (Jackson) 
Liberal (progressive Viewpoint (Harnly) 

As to Educational Purposes 

As to Educational Policies 

As to Educational Objectives 

As to Educational Method 


Statistical Validity 
C-com.(4)a C-com.(3)b 
. 425° 
. 319 
. 240 
. 476** 
. 485** 


a Criterion composited from C-app., C-att., C-inf., and’C-int. 
b Criterion composited from C—app., C-att., C-inf. 


* Statistically significant at 5% level. 
** Statistically significant at 17% level. 
*** The sign has been reflected. 


Comparing upper and lower fourths, the 
following results were obtained: 
: of rte 2 fourth on_test found in 
urth of criterion 
% > upper fourth on test found in 
lower fourth of criterion 
% of lower fourth on test found in 
lower fourt# of criterion 
% of lower fourth on test found in 
upper fourth of criterion 


121.4% 
50% 


How does this compare with other instru- 
ments used for predictive purposes? Appropos 
intelligence tests and prediction, Freeman 
says: 

“The correlation between intelligence 
tests and composite standing of pupils may 
= said, then, to lie usually between .40 and 


Frank N. Freeman, Mental Tests (Boston: Houghton 
Mifftie Company, 1926), p. 372. 


.60. Probably in the majority of cases the 
correlations will be found in the neighbor- 
hood of .50, but under very favorable con- 
ditions, it may be somewhat above this. The 
practical meaning of this correlation is that 
it enables us with moderate degree of accu- 
racy to predict the grade of work which a 
student will do in school or college.” 


In recent years colleges and universities have 
been using “aptitude” or “scholarship” tests for 
purposes of predicting college grade point 
average. In his study of such prediction devices 
used at the University of Wisconsin, Froelich™ 
found the validity of the Wisconsin Achieve- 
ment Test as an instrument for prediction of 
scholastic success for first semester freshmen 


@ Gustav J. Froelich, The Validity of the Wisconsin 
Achievement Test as an Instrument for Prediction of Scho- 
lastic Success at the University of Wisconsin, un hed 
ran. thesis (Madison, Wis.: University of Wisconsin, 1940), 
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TABLE XXXIV 
INTERCORRELATIONS AMONG TEACHER MEASURES AND THE CRITERION 


Ko y ones oe (pupil change) 

X, American Council Psychological 
Parts I, my i & IV A 

X. Torgerson wy — 

X, Jackson Social Profi 

X, Harnl momen Toward I Method 
Part 


Xi Xe Xs 
. 550 . 815 —. 380 


*The si of this r has been reflected; a positive r here signifies that conservative teachers secured 


higher pupil change scores. 


to be .61, while the validities of secondary 
school rank, American Council Psychological 
Examination and the Henmon—Nelson Test, 
were .62, .55, and .48 respectively. From his 
observation that such a test is “as good a 
measure of university success as any other 
available measure,” it is apparent that predic- 
tive measures of really high validity are diffi- 
cult to develop, and that the relationship found 
in this study between parts of the American 
Psychological Examination and the criterion of 
teaching efficiency is relatively high. 


In a further attempt to discover what meas- 
, ures might be used to give the best measure of 
teaching efficiency and how to combine these 
to give the highest correlation with the crite- 
rion, four of the measures that gave the highest 
zero order correlations were combined in a re- 
gression equation. The intercorrelations from 
which this regression equation was calculated is 
given in Table XXXIV. 


The beta coefficients obtained by the Aitken 
method® are as follows: 


Baz= 539 Bog == —.405 
Bo «291 Bos = .329 


The regression equation in standard score 
form is therefore: 
Z, == .539z, + .291z, — .405z, + .3292z, 


The multiple correlation can be immediately 
calculated as follows: 


R = \/.539(.550) + .291(.315) + 
(—.405) (—.380) + .329(-318) — .804* 


This multiple R represents considerable gain 
over that reported by Rostker and Rolfe, in 


® Charles C. Peters and Walter R. VanVoorhis, Statistical 
Procedures and Their Mathematical Basis (New York: 
McGraw-Hill Book Co., 1940), : 227. 
The author is indebted to A. S. Barr, Ronald D. 
, and L. Joseph Lins P Tis calculation. It will be 
jones, and L. Josepn Lins for this calculation. 








that four measures here give a better correla- 
tion than that secured by Rolfe for eleven 
measures and equal to the correlation secured 
by Rostker from ten measures. For fourteen 
measures Rostker secured a multiple R of .85. 


SECTION VI 


SUMMARY, CONCLUSIONS, AND 
LIMITATIONS 


PURPOSE 


The purpose of this study was to determine 
the validity of certain teacher tests and rating 
scales as measures of teaching efficiency when 
pupil change is employed as the criterion. 

From the results of this investigation, the 
following generalizations are offered: 


1. Valid criteria of teaching efficiency based 
upon objectively determined pupil change in 
different aspects of the various subject areas, 
may be determined only with difficulty. The 
validity of the criteria will be limited by the 
validity and reliability of the pupil tests used. 
As better instruments for measuring pupil 
change are constructed, including reactions 
other than those that may be registered through 
paper and pencil tests, better measurement of 
teaching may result. 

It should here be emphasized that the cri- 
teria of teaching efficiency objectively deter- 
mined in this study might be more appropri- 
ately labeled “criterion of teaching apprecia- 
tions as related to Community Living,” “‘crite- 
rion of teaching information related to Com- 
munity Living,” etc., since but one area of 
teaching is involved. Since there is no evidence 
to show that teaching efficiency in the area 
studied is directly related to efficiency in other 
areas, no such inferences are drawn, with refer- 
ence to these areas. 
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¥2. Intelligence of teachers as measured by 
the total score and part scores on the American 
Council Psychological Examination is signifi- 
cantly related to teaching efficiency as measured 
here (.61). 

3. Professional knowledge of the theory and 
practice of mental hygiene is positively but not 
significantly related to teaching efficiency (.35). 

4. Whether a teacher is liberal or conserva- 
tive. with reference to educational objectives, 

urposes, and policies, or not seems to make 

little difference to her efficiency. There is a 
tendency for the efficient teacher to be con- 
servative in her teaching methods (—.32). 

5. The teacher's attitude toward her profes- 
sion or toward her fellow teachers as herein 
measured showed little relationship to her 
efficiency (.16). 

6. Teachers who are the more considerate of 
others as here measured tend to be inefficient, 
although the relationship is not statistically 
significant (—.35). 

7. Ratings of teaching efficiency by superin- 
tendents and supervising teachers do not agree 
with the criterion of pupil gain. 
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8. The use of different rating scales by he 
same rater on the same teachers results in cc 
siderable difference in the teacher ranking. 


RECOMMENDATIONS 


The outcomes of this study make the gener 
problem of the measurement of teaching eff 
ciency more challenging than before. The tedy 
nique of securing pupil change attributable 
the teacher has been somewhat clarified 
simplified. The principal weakness of the 
lay in the fact that pupil change, and therefor 
teaching efficiency was determined for but 
small part of the complete school experieng 
of the pupils. ’ 

Future studies of this type should extend ther 
criteria to include all of the pupil's school activ 
ities. Using the findings of this and simi 
studies, investigators should construct measure 
which might more nearly measure factomg 
accompanying or paralleling teacher success. 7 


{The remainder of the studies in this series 
will be reported in the December issue.}  ™ 
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